Job Robots Troubleshooting
Description
NO_DWLOD Cannot download default.tgz from gsiftp
Also SAM tests were affected. In fact, we can't
uberftp wms213.cern.ch
530-Login incorrect. : globus_gss_assist: Error invoking callout
530-globus_callout_module: The callout returned an error
530-an unknown error occurred
530 End.
Checked it there is expired CRLs for CERN (there is a probe on OSG-RSV tests
https://osg-ce.sprace.org.br:8443/rsv/ ), it was ok. Found a problem in our
$VDT_LOCATION/glite/etc/vomses
that was pointing
cms
to a non-existing machine. We recovered this file from backup.
This problems was completely resolved when we provide, in all worker nodes, the correct link to
globus/TRUSTED_CA
, inside its
GLOBUS_LOCATION
(
/opt/OSG-wn-client
), where they can found its CRLs.
BrokerHelper: no compatible resources
All jobRobots are aborted in our farm, looking at the page
http://jobrobot.web.cern.ch/JobRobot/aborted_081019.html#T2_BR_SPRACE
BrokerHelper: no compatible resources
request expired
First we checked some corruption in our CMSSW installation, running a crab using the same version of CMSSW pointed in
http://jobrobot.web.cern.ch/JobRobot/summary_081019.html
following instructions at /twiki/bin/view/Main/EntryDescriptionNo53
May be this error is relatade with an ambiguous BDII publication due a requirement that makes the matchmaking to fail, actually
Member("osg-se.sprace.org.br",other.GlueCESEBindGroupSEUniqueID)
In our BDII was:
objectClass: GlueSchemaVersion
GlueCESEBindGroupCEUniqueID: osg-ce.sprace.org.br:2119/jobmanager-condor-cms
GlueCESEBindGroupSEUniqueID: osg-se.sprace.org.br
GlueCESEBindGroupSEUniqueID: osg-se.sprace.org.br
GlueSchemaVersionMajor: 1
Note that
GlueCESEBindGroupSEUniqueID: osg-se.sprace.org.br appears twice. To remove it, we need to fix GIP that collects information to CEMon.
Changing directly the file
/OSG/gip/var/ldif/osg-info-static-cesebind.ldif
seems that it doesn't work.
So, we changed the gip-attributes that is read by configure_gip to make this file
vim /OSG/monitoring/gip-attributes.conf
OSG_GIP_DISK="0"
/OSG/vdt/setup/configure_gip
And it can be checked to work with
ldapsearch -x -LLL -p 2170 -h lcg-bdii.cern.ch -b mds-vo-name=SPRACE,mds-vo-name=local,o=grid > jobrobot.txt
Updates
Fulano em dd/mm/aaaa
Coloca o que fez.
Ciclano em dd/mm/aaaa
Mais comentarios
--
MarcoAndreFerreiraDias - 19 Oct 2008