Compressão dos Logs da SPDC00.
Description
Iremos liberar mais espaço no /var da SPDC00. Parando o Phedex
[mdias@spdc00 mdias]$ su -
Password:
[root@spdc00 root]# su - phedex
[phedex@spdc00 phedex]$ grid-proxy-info
subject : /DC=org/DC=doegrids/OU=People/CN=Eduardo Gregores 407221/CN=proxy/CN=proxy/CN=proxy
issuer : /DC=org/DC=doegrids/OU=People/CN=Eduardo Gregores 407221/CN=proxy/CN=proxy
identity : /DC=org/DC=doegrids/OU=People/CN=Eduardo Gregores 407221
type : full legacy globus proxy
strength : 1024 bits
path : /home/phedex/gridcert/proxy.cert
timeleft : 0:00:00
[phedex@spdc00 phedex]$ Master -config ~/SITECONF/local/PhEDEx/Config.Prod stop
Parando o dCache
[root@spdc00 root]# /opt/pnfs/bin/pnfs stop
Shutting down pnfs services (PostgreSQL version):
Stopping Heartbeat .... Ready
Killing pnfsd Done
Killing pmountd Done
Killing dbserver . Done
Removing 8 Clients 0+ 1+ 2+ 3+ 4+ 5+ 6+ 7+
Removing 8 Servers 0+ 1+ 2+ 3+ 4+ 5+ 6+ 7+
Removing main switchboard ... O.K.
[root@spdc00 root]# /opt/d-cache/bin/dcache-core stop
Shutting down dcache services:
Pid File (/opt/d-cache/config/lastPid.gridftp-spdc00) doesn't contain valid PID
Pid File (/opt/d-cache/config/lastPid.gsidcap-spdc00) doesn't contain valid PID
Stopping srm-spdc00Domain (pid=22268) 0 1 2 3 4 5 6 7 Done
Pid File (/opt/d-cache/config/lastPid.replica) doesn't contain valid PID
Stopping utilityDomain (pid=21995) 0 1 2 3 4 5 6 7 8 Done
Stopping httpdDomain (pid=21909) 0 1 2 3 4 5 6 7 Done
Stopping infoProviderDomain (pid=22171) 0 1 2 3 4 5 6 7 8 Done
Stopping pnfsDomain (pid=22083) 0 1 2 3 4 5 6 7 Done
Stopping adminDoorDomain (pid=21830) 0 1 2 3 4 5 6 7 8 Done
Stopping doorDomain (pid=21755) 0 1 2 3 4 5 6 7 Done
Stopping dirDomain (pid=21682) 0 1 2 3 4 5 6 7 8 Done
Stopping dCacheDomain (pid=21605) 0 1 2 3 4 5 6 7 Done
Stopping lmDomain (pid=21542) 0 1 2 3 4 5 6 7 8 Done
agora na SPRaid
[root@spraid root]# /opt/d-cache/bin/dcache-core stop
Shutting down dcache services:
Stopping gridftp-spraidDomain (pid=26016) 0 1 2 3 4 5 6 7 Done
Pid File (/opt/d-cache/config/lastPid.gsidcap-spraid) doesn't contain valid PID
Stopping srm-spraidDomain (pid=26106) 0 1 2 3 4 5 6 7 Done
Pid File (/opt/d-cache/config/lastPid.replica) doesn't contain valid PID
Pid File (/opt/d-cache/config/lastPid.utility) doesn't contain valid PID
Pid File (/opt/d-cache/config/lastPid.httpd) doesn't contain valid PID
Pid File (/opt/d-cache/config/lastPid.infoProvider) doesn't contain valid PID
Pid File (/opt/d-cache/config/lastPid.pnfs) doesn't contain valid PID
Pid File (/opt/d-cache/config/lastPid.adminDoor) doesn't contain valid PID
Pid File (/opt/d-cache/config/lastPid.door) doesn't contain valid PID
Pid File (/opt/d-cache/config/lastPid.dir) doesn't contain valid PID
Pid File (/opt/d-cache/config/lastPid.dCache) doesn't contain valid PID
Pid File (/opt/d-cache/config/lastPid.lm) doesn't contain valid PID
[root@spraid root]# /opt/d-cache/bin/dcache-pool stop
Shutting down dcache pool: Stopping spraidDomain (pid=26283) 0 1 2 3 4 5 6 7 Done
Fazendo a compressão dos logs
[root@spdc00 log]# cd /var/log
[root@spdc00 log]# gzip srm-spdc00Domain.log
[root@spdc00 log]# mv srm-spdc00Domain.log.gz srm-spdc00Domain.log.gz.3
Agora vamos ligar novamente os serviços. Primeiro o Phedex
[root@spdc00 root]# su - phedex
[phedex@spdc00 phedex]$ Master -config ~/SITECONF/local/PhEDEx/Config.Prod start
FileDownload: pid 4069 started in /home/phedex/state/download-master-prod
FileDiskExport: pid 4075 started in /home/phedex/state/exp-disk-prod
InfoDropStatus: pid 4081 started in /home/phedex/state/info-ds-prod
FilePFNExport: pid 4087 started in /home/phedex/state/exp-pfn-prod
como root
[root@spdc00 root]# /opt/pnfs/bin/pnfs start
Starting pnfs services (PostgreSQL version):
Shmcom : Installed 8 Clients and 8 Servers
Starting database server for admin (/opt/pnfsdb/pnfs/databases/admin) ... O.K.
Starting database server for data1 (/opt/pnfsdb/pnfs/databases/data1) ... O.K.
Waiting for dbservers to register ... Ready
Starting Mountd : pmountd
Starting nfsd : pnfsd
[root@spdc00 root]# /opt/d-cache/bin/dcache-core start
Starting dcache services:
Starting lmDomain 6 5 4 3 2 1 0 Done (pid=4302)
Starting dCacheDomain 6 5 4 3 2 1 0 Done (pid=4365)
Starting dirDomain 6 5 4 3 2 1 0 Done (pid=4442)
Starting doorDomain 6 5 4 3 2 1 0 Done (pid=4515)
Starting adminDoorDomain 6 5 4 3 2 1 0 Done (pid=4591)
Starting httpdDomain 6 5 4 3 2 1 0 Done (pid=4677)
Starting utilityDomain 6 5 4 3 2 1 0 Done (pid=4768)
Starting pnfsDomain 6 5 4 3 2 1 0 Done (pid=4862)
Starting infoProviderDomain 6 5 4 3 2 1 0 Done (pid=4952)
Starting srm-spdc00Domain 6 5 4 3 2 1 0 Done (pid=5049)
voltando à SPRaid
[root@spraid root]# /opt/d-cache/bin/dcache-core start
Starting dcache services:
Starting gridftp-spraidDomain 6 5 4 3 2 1 0 Done (pid=12023)
Starting srm-spraidDomain 6 5 4 3 2 1 0 Done (pid=12113)
[root@spraid root]# /opt/d-cache/bin/dcache-pool start
Starting dcache pool: Starting spraidDomain 6 5 4 3 2 1 0 Done (pid=12290)
Vamos verificar o espaço:
[root@spdc00 log]# df -h
/dev/sda5 2.0G 1.8G 72M 97% /var
Piorou! Melhor mover os logs comprimidos para outro lugar:
[root@spdc00 log]# mv /var/log/srm-spdc00Domain.log.gz.* /home/mdias/.
[root@spdc00 log]# df -h
/dev/sda5 2.0G 1.6G 276M 86% /var
ainda é muito. entretanto
[root@spdc00 log]# lsof /dev/sda5
COMMAND PID USER FD TYPE DEVICE SIZE NODE NAME
tail 1635 mdias 3r REG 8,5 1278886709 131165 /var/log/srm-spdc00Domain.log (deleted)
[root@spdc00 log]# kill -9 1635
[root@spdc00 log]# df -h
/dev/sda5 2.0G 417M 1.5G 22% /var
[root@spdc00 log]# mv /home/mdias/srm-spdc00Domain.log.gz.* /var/log/.
[root@spdc00 log]# df -h
[root@spdc00 log]# mv /home/mdias/srm-spdc00Domain.log.gz.* /var/log/.
[root@spdc00 log]# df -h
Sorry! deveria ter fechado o tail de monitoramento antes. Burrice!
No monitoramento do phedex estamos UP em tudo na instância production. No dCache OK tanto em Cell service quanto Pooll usage
Updates