Phedex e OSG fora do ar.

Description

07h44 e o nosso Prodution Component Status do PhEDEx esta down a 10h25min. O site da OSG també deu o seguinte retorno do seus testes
 Authentication:   Pass    2006-10-09 09:04:29 GMT
Hello World:    Fail
Command:
globus-job-run spgrid.if.usp.br:2119 /bin/sh -c "echo Hello World ; echo Hello_World_DONE"
Reason:
Timeout ; output : /usr/local/globusc/globus/bin/globus-job-run: line 1: 18198 Killed /usr/local/globusc/globus/bin/globusrun -q -o -r "spgrid.if.usp.br:2119" -f /tmp/globus_job_run.osggridcat.rsl.18125 ; status : 246       2006-10-09 09:06:09 GMT
CONDOR Batch System:     
 -Batch Query:  Pass    2006-10-09 09:04:35 GMT
 -Batch Sub:    Pass    2006-10-09 09:04:35 GMT
 -Batch Cancel:         Fail
Command:
globus-job-clean -force -r spgrid.if.usp.br:2119/jobmanager-fork https://spgrid.if.usp.br:_port_range_port_/number1/number2
Reason:
Unknown ; output: Could not clean up job. ; status: 245         2006-10-09 09:04:36 GMT
gsiftp:         Pass    2006-10-09 09:06:14 GMT
Web Service Hello World:        Pass    2006-10-09 09:07:17 GMT
 

Updates

Vou restartar o serviço da phedex. Pelo que me parece o grid proxy é válido:

[root@spdc00 root]# su - phedex
[phedex@spdc00 phedex]$ grid-proxy-info
subject  : /DC=org/DC=doegrids/OU=People/CN=Eduardo Gregores 407221/CN=proxy/CN=proxy/CN=proxy
issuer   : /DC=org/DC=doegrids/OU=People/CN=Eduardo Gregores 407221/CN=proxy/CN=proxy
identity : /DC=org/DC=doegrids/OU=People/CN=Eduardo Gregores 407221
type     : full legacy globus proxy
strength : 1024 bits
path     : /home/phedex/gridcert/proxy.cert
timeleft : 11:16:16
então:
[phedex@spdc00 phedex]$  Master -config ~/SITECONF/local/PhEDEx/Config.Prod stop
[phedex@spdc00 phedex]$  Master -config ~/SITECONF/local/PhEDEx/Config.Prod start
FileDownload: pid 29035 already running in /home/phedex/state/download-master-prod
FileDiskExport: pid 29041 already running in /home/phedex/state/exp-disk-prod
InfoDropStatus: pid 29047 already running in /home/phedex/state/info-ds-prod
FilePFNExport: pid 29053 already running in /home/phedex/state/exp-pfn-prod

mas mesmo às 08h27 não conseguimos entrar no serviço com UP. Restartei novamente.

[phedex@spdc00 phedex]$  tail -n 10 /home/phedex/logs/download-master
2006-09-30 22:01:30: FileDownload[6579]: xstats: to=T2_SPRACE_Buffer from=T1_CERN_Load fileid=3610 state=100 size=2074217787 time_assigned=3856.96 time_all=2835.94 time_preclean=0.22 time_transfer=594.32 time_validate=2223.72 time_postclean=7.23 lfn=/store/test/2006/06/16/IntegrationLargeSample/0000/LoadTest_T1_CERN_0070 from_pfn=srm://srm.cern.ch:8443/srm/managerv1?SFN=/castor/cern.ch/cms/store/test/2006/06/16/IntegrationLargeSample/0000/LoadTest_T1_CERN_0070 to_pfn=srm://spdc00.if.usp.br:8443/srm/managerv1?SFN=/pnfs/if.usp.br/data/cms/store/test/2006/06/16/IntegrationLargeSample/0000/LoadTest_T1_CERN_0070
2006-09-30 22:01:31: FileDownload[6579]: Stopped all pending jobs

O log da spgrid sobre os problemas com o monitoramento da OSG dão

[mdias@spgrid mdias]$ tail -f /OSG/globus/var/globus-gatekeeper.log
 PID: 25742 -- Notice: 5:           and local gid: 524
TIME: Mon Oct  9 08:28:32 2006
 PID: 25742 -- Notice: 0: executing /usr/local/opt/OSG/globus/libexec/globus-job-manager
TIME: Mon Oct  9 08:28:32 2006
 PID: 25742 -- Notice: 0: GATEKEEPER_JM_ID 2006-10-09.08:28:32.0000025742.0000000000 for /DC=org/DC=doegrids/OU=People/CN=Leigh Grundhoefer (GridCat) 693100 on 129.79.4.64
TIME: Mon Oct  9 08:28:32 2006
 PID: 25742 -- Notice: 0: GRID_SECURITY_CONTEXT_FD=11
TIME: Mon Oct  9 08:28:32 2006
 PID: 25742 -- Notice: 0: Child 25771 started
sh: line 1: /var/tmp/gratia.log: Permission denied
o que parece normal.Vou tentar restartar o SC4
[phedex@spdc00 phedex]$  Master -config ~/SITECONF/local/PhEDEx/Config.SC4 start
FileDownload: removing old stop flag /home/phedex/state/download-master/stop
FileDownload: pid 19841 started in /home/phedex/state/download-master
FileDiskExport: removing old stop flag /home/phedex/state/exp-disk/stop
FileDiskExport: pid 19847 started in /home/phedex/state/exp-disk
InfoDropStatus: removing old stop flag /home/phedex/state/info-ds/stop
InfoDropStatus: pid 19853 started in /home/phedex/state/info-ds
FilePFNExport: removing old stop flag /home/phedex/state/exp-pfn/stop
FilePFNExport: pid 19859 started in /home/phedex/state/exp-pfn
FileRecycler: removing old stop flag /home/phedex/state/download-recycle/stop
[phedex@spdc00 phedex]$ FileRecycler: pid 19865 started in /home/phedex/state/download-recycle

[phedex@spdc00 phedex]$  Master -config ~/SITECONF/local/PhEDEx/Config.Prod start
FileDownload: pid 29035 already running in /home/phedex/state/download-master-prod
FileDiskExport: pid 29041 already running in /home/phedex/state/exp-disk-prod
InfoDropStatus: pid 29047 already running in /home/phedex/state/info-ds-prod
FilePFNExport: pid 29053 already running in /home/phedex/state/exp-pfn-prod
[phedex@spdc00 phedex]$ tail -n 20 /home/phedex/logs/download-master
2006-09-30 22:01:31: FileDownload[6579]: Stopped all pending jobs
2006-10-09 11:35:54: FileDownload[19841]: (re)connecting to database

UPDATE

O Eduardo resolveu o problema.
Topic revision: r2 - 2006-10-11 - MarcoAndreFerreiraDias
 

This site is powered by the TWiki collaboration platform Powered by PerlCopyright © 2008-2024 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback

antalya escort bursa escort eskisehir escort istanbul escort izmir escort