Difference between revisions of "Problems After CA 1.88-1"

From GridPP Wiki
Jump to: navigation, search
Line 15: Line 15:
 
If any of those conditions are missing, then the error doesn't happen. So fixes are:
 
If any of those conditions are missing, then the error doesn't happen. So fixes are:
  
* Jens is sending a CRL with a "non-critical extension".  A site could wait until the CRL propagates around the sites. Initial tests of this idea showed that the root certificates may also need changes, which is being investigated (Jens, pls confirm when you "know" this actually works!)
+
* Jens is sending a CRL with a "non-critical extension".  A site could wait until the CRL propagates around the sites. Initial tests of this idea showed that the root certificates may also need changes, which is being investigated (Jens, pls confirm when you "know" this actually works.)
  
 
* Use a better version of bouncycastle. One way on ARGUS to get that it is install it with Centos7, and perhaps UMD3 or 4. I haven't tested this, but Chris Brew assured us that the problem doesn't come out on Centos7 (Chris, please confirm version of BC on your ARGUS server).
 
* Use a better version of bouncycastle. One way on ARGUS to get that it is install it with Centos7, and perhaps UMD3 or 4. I haven't tested this, but Chris Brew assured us that the problem doesn't come out on Centos7 (Chris, please confirm version of BC on your ARGUS server).

Revision as of 15:32, 5 December 2017


Following the release of lcg-CA (etc.) 1.88-1, various problems concerning authentication cropped up at several sites. Some of the types of problem are mentioned in Appendix 1, as well as a workaround to keep a site going in the short term by rolling back the CA certificates. But the deadline for updating to 1.88-1 is 2017.12.04, i.e. already passed, so I'm listing here the basic cause of the problem, determined by Robert Frank, and some other measures sites can take to maintain operations.

The factors necessary for this problem to happen are:

  • It happens where one system sends proxies to another to be authenticated by bouncycastle.
  • The systems must have different versions of the CA certificates; one must have 1.87-1 and the other must have 1.88-1; it doesn't matter which way round, they just must be different.
  • The system that is authenticating incoming proxies (e.g. CREAM or ARGUS, or whatever) must be on bouncycastle-1.46-1 (unpatched version).
  • The system that is authenticating incoming proxies must have a CRL for the UK CA cert (UKeScienceCA-2B) that contains no "non-critical extensions".

If any of those conditions are missing, then the error doesn't happen. So fixes are:

  • Jens is sending a CRL with a "non-critical extension". A site could wait until the CRL propagates around the sites. Initial tests of this idea showed that the root certificates may also need changes, which is being investigated (Jens, pls confirm when you "know" this actually works.)
  • Use a better version of bouncycastle. One way on ARGUS to get that it is install it with Centos7, and perhaps UMD3 or 4. I haven't tested this, but Chris Brew assured us that the problem doesn't come out on Centos7 (Chris, please confirm version of BC on your ARGUS server).
  • If you use ARC, you could update bouncycastle to the "Robert Frank" patched version and then just update everything to 1.88-1. This has been tested and seems/is safe. The name of the rpm is bouncycastle-1.46-1.el6.1.noarch, and Robert provides details in Appendix 2.
  • Any other options ...


Appendix 1

In the last days of November 2017, a new set of root certificates were released, version 1.88-1. At Liverpool, the rpms on our central ARGUS server were updated automatically in the evening of 27th Nov. Once the already queued jobs had started, the jobs began to dwindle and I noticed it the next morning. On the ARGUS server, in the /var/log/argus/pepd/process.log file, were lots of errors like this.

2017-11-27 18:00:03.102Z - ERROR [TrustStoreValidationErrorLogger] - Validation error: error at position 0 in chain, problematic certificate subject:  
CN=hepgrid11.ph.liv.ac.uk,L=CSD,OU=Liverpool,O=eScience,C=UK (category: CRL): Can not verify the CRL as its issuer's public key is unknown or can not
be validated Cause: Certification path could not be validated. Cause: NullPointerException

It was affecting our CEs and our DPM headnode (hepgrid11). I got the site back up temporarily by following these steps.

The get the site back running, I rolled back to 1.87-1 on our CEs, SE and our ARGUS server. To rollback, I first removed the existing references to the current repo, then put in repo listed below. That points to the old 1.87-1 versions (the baseurl is different from the standard place).

# pwd
/etc/yum.repos.d
# cat EGI-trustanchors.repo
[EGI-trustanchors]
name=EGI-trustanchors
baseurl=https://egi-igtf.ndpf.info/distribution/egi-1.87-1/ca-policy-egi-core-1.87-1/
enabled=1
gpgcheck=0
priority=3

It may be possible to use “yum history” for this, but I used these commands to remove the newly installed 1.88-1 CAs.

# for p in `rpm -qa | grep 1.88-1 | grep ca_`; do yum -y remove  $p; done

Then check for other packages of version 1.88-1 and, if any, remove those too, by hand.

# rpm -qa | grep 1.88-1

Then yum install (or update) lcg-CA (or ca-policy-egi-core, or whatever it is you use). Obviously, this is OK for the time being, but we’ll have to go to a new version of the CAs sometime soon.

Appendix 2

Robert Frank's notes on his update to bouncycastle.

I've built an SL6 bouncycastle rpm which uses the fixed implementation of that function:

rpm: http://mirror.tier2.hep.manchester.ac.uk/Repositories/local/6/x86_64/bouncycastle-1.46-1.el6.1.noarch.rpm
src: http://mirror.tier2.hep.manchester.ac.uk/Repositories/local/6/sources/bouncycastle-1.46-1.el6.1.src.rpm
patch: http://mirror.tier2.hep.manchester.ac.uk/tier2/deltacrl.patch

After installing the rpm, my problems with the java voms clients disappeared. More testing is needed though.