Difference between revisions of "Problems After CA 1.88-1"

From GridPP Wiki
Jump to: navigation, search
(cosmetic)
Line 97: Line 97:
 
  > whatever) must be on bouncycastle-1.46-1 (unpatched version).
 
  > whatever) must be on bouncycastle-1.46-1 (unpatched version).
 
   
 
   
  Again, not quite. This has nothing to do with proxies. It's any system that uses an unpatched bouncycastle to validate a certificate chain it receives from the  remote side against it's locally installed set of trust anchors.
+
  Again, not quite. This has nothing to do with proxies. It's any system that uses an unpatched bouncycastle to
 +
validate a certificate chain it receives from the  remote side against it's locally installed set of trust  
 +
anchors.
 
   
 
   
 
  > d) The system that is authenticating incoming proxies must have a CRL for  
 
  > d) The system that is authenticating incoming proxies must have a CRL for  
Line 109: Line 111:
 
  * both sides use different versions of the 2B CA (installed by a trust anchor release, installed in the browser, etc)
 
  * both sides use different versions of the 2B CA (installed by a trust anchor release, installed in the browser, etc)
 
  * a certificate chain containing the 2B CA has to be transferred from one side to the other
 
  * a certificate chain containing the 2B CA has to be transferred from one side to the other
  * the side receiving the chain uses an unpatched version of bouncycastle to validate the received chain against a local installation of the trust anchors
+
  * the side receiving the chain uses an unpatched version of bouncycastle to validate the received chain against  
 +
  a local installation of the trust anchors
 
  * a CRL without any extensions issued by any of the CAs in the chain is present for the local trust anchors
 
  * a CRL without any extensions issued by any of the CAs in the chain is present for the local trust anchors
 
   
 
   
Line 120: Line 123:
 
  > this actually works!)
 
  > this actually works!)
 
   
 
   
  This might be needed for all CAs in the chain. I'll test it again once Jens issued an updated CRL for the Root CA. It's possible that having it in the CRL of the  root CA is enough, but I won't know for sure until I've tested it.
+
  This might be needed for all CAs in the chain. I'll test it again once Jens issued an updated CRL for the Root CA.
 +
It's possible that having it in the CRL of the  root CA is enough, but I won't know for sure until I've tested it.
 
   
 
   
 
  > 2) Use a better version of bouncycastle. One way on ARGUS to get that it  
 
  > 2) Use a better version of bouncycastle. One way on ARGUS to get that it  
Line 127: Line 131:
 
  > (Chris, please confirm version of BC on your ARGUS server).
 
  > (Chris, please confirm version of BC on your ARGUS server).
 
   
 
   
  You can get Argus 1.7 for SL6x from UMD 4, but not from UMD 3. Argus 1.7 ships with a newer version of BC which doesn't have the problem.
+
  You can get Argus 1.7 for SL6x from UMD 4, but not from UMD 3. Argus 1.7 ships with a newer version of BC  
 +
which doesn't have the problem.
 
   
 
   
 
  > 3) If you use ARC (sj: typo, I meant ARGUS) or CREAM, you could update bouncycastle to the "Robert Frank"  
 
  > 3) If you use ARC (sj: typo, I meant ARGUS) or CREAM, you could update bouncycastle to the "Robert Frank"  
Line 133: Line 138:
 
  > to 1.88-1. This has been tested and seems/is safe.
 
  > to 1.88-1. This has been tested and seems/is safe.
 
   
 
   
  Correct. If you use CREAM you can do the same, you just have to install the patched version on the CE as well (has been tested in Manchester).
+
  Correct. If you use CREAM you can do the same, you just have to install the patched version on the CE as  
 +
well (has been tested in Manchester).
 
   
 
   
 
   
 
   
  Also, all services that use the CANL library to reload CA certificates and CRLs automatically need to be restarted after the update to 1.88.
+
  Also, all services that use the CANL library to reload CA certificates and CRLs automatically need to be  
 +
restarted after the update to 1.88.
 
   
 
   
 
  Cheers,
 
  Cheers,
 
  Robert
 
  Robert

Revision as of 21:32, 5 December 2017


Following the release of lcg-CA (etc.) 1.88-1, various problems concerning authentication cropped up at several sites. Some of the types of problem are mentioned in Appendix 1, as well as a workaround to keep a site going in the short term by rolling back the CA certificates. But the deadline for updating to 1.88-1 is 2017.12.04, i.e. already passed, so I'm listing here the basic cause of the problem, determined by Robert Frank, and some other measures sites can take to maintain operations.

Simply stated, the factors necessary for this problem to happen are as follows (an unexpurgated, i.e. accurate, version of this explanation is given in Appendix 3).

  • It happens where one system sends certificates to another to be authenticated by (old versions of) bouncycastle (i.e. Java applications.)
  • The systems must have different versions of the CA certificates; one must have 1.87-1 and the other must have 1.88-1; it doesn't matter which way round, they just must be different.
  • The system that is authenticating incoming certificates (e.g. CREAM or ARGUS, or whatever) must be on bouncycastle-1.46-1 (unpatched version - this version is very old, from 2013, and the bug was fixed from 1.48 onwards according to NIKHEF).
  • The system that is authenticating incoming certificates must have a CRL (UKeScienceCA-2B or root) containing no extensions (note that both sides are authenticating, usually, in the grid scenarios.)

If any of those conditions are missing, then the error doesn't happen. So fixes are:

  • Jens is sending a CRL with a "non-critical extension". A site could wait until the CRL propagates around the sites. Initial tests of this idea showed that the root certificates may also need changes, which is being investigated (Jens, pls confirm when you "know" this actually works.)
  • Use a better version of bouncycastle. One way on ARGUS to get that it is install it with Centos7, and perhaps UMD3 or 4. I haven't tested this, but Chris Brew assured us that the problem doesn't come out on Centos7 (Chris, please confirm version of BC on your ARGUS server).
  • If you use ARGUS or CREAM, you could update bouncycastle to the "Robert Frank" patched version and then just update everything to 1.88-1. This has been tested and seems/is safe. The name of the rpm is bouncycastle-1.46-1.el6.1.noarch, and Robert provides details in Appendix 2.
  • Any other options ...


Appendix 1 - How to Roll Back

In the last days of November 2017, a new set of root certificates were released, version 1.88-1. At Liverpool, the rpms on our central ARGUS server were updated automatically in the evening of 27th Nov. Once the already queued jobs had started, the jobs began to dwindle and I noticed it the next morning. On the ARGUS server, in the /var/log/argus/pepd/process.log file, were lots of errors like this.

2017-11-27 18:00:03.102Z - ERROR [TrustStoreValidationErrorLogger] - Validation error: error at position 0 in chain, problematic certificate subject:  
CN=hepgrid11.ph.liv.ac.uk,L=CSD,OU=Liverpool,O=eScience,C=UK (category: CRL): Can not verify the CRL as its issuer's public key is unknown or can not
be validated Cause: Certification path could not be validated. Cause: NullPointerException

It was affecting our CEs and our DPM headnode (hepgrid11). I got the site back up temporarily by following these steps.

I rolled back to 1.87-1 on our CEs, SE and our ARGUS server. To rollback, I first removed the existing references to the current repo, then put in repo listed below. That points to the old 1.87-1 versions (the baseurl is different from the standard place).

# pwd
/etc/yum.repos.d
# cat EGI-trustanchors.repo
[EGI-trustanchors]
name=EGI-trustanchors
baseurl=https://egi-igtf.ndpf.info/distribution/egi-1.87-1/ca-policy-egi-core-1.87-1/
enabled=1
gpgcheck=0
priority=3

It may be possible to use “yum history” for this, but I used these commands to remove the newly installed 1.88-1 CAs.

# for p in `rpm -qa | grep 1.88-1 | grep ca_`; do yum -y remove  $p; done

Then check for other packages of version 1.88-1 and, if any, remove those too, by hand.

# rpm -qa | grep 1.88-1

Then yum install (or update) lcg-CA (or ca-policy-egi-core, or whatever it is you use). Obviously, this is OK for the time being, but we’ll have to go to a new version of the CAs sometime soon.

Appendix 2 - Robert Frank's Bug Fix

Robert Frank's notes on his update to bouncycastle.

I've built an SL6 bouncycastle rpm which uses the fixed implementation of that function:

rpm: http://mirror.tier2.hep.manchester.ac.uk/Repositories/local/6/x86_64/bouncycastle-1.46-1.el6.1.noarch.rpm
src: http://mirror.tier2.hep.manchester.ac.uk/Repositories/local/6/sources/bouncycastle-1.46-1.el6.1.src.rpm
patch: http://mirror.tier2.hep.manchester.ac.uk/tier2/deltacrl.patch

After installing the rpm, my problems with the java voms clients disappeared. More testing is needed though.

Appendix 3 - Robert's Full Explanation

The summary above is an easy-to-understand but slightly simplified explanation. It's probably satisfactory for most sites. But for those who prefer the truth, the whole truth and nothing but the truth, here are full, unexpurgated explanations by Robert Frank that use more accurate prose. His email also contains some info on

  1. his progress testing Jens' effort to solve the problem using non-critical extensions,
  2. possible repositories that contain ARGUS 1.7 (with a patched copy of BC) for various UMD distributions, and
  3. related problems with the CANL libs, and how to work around them.

I reproduce verbatim the important sections below.


On 05/12/17 14:54, Stephen Jones wrote:
> The deadline for updating to 1.88-1 is 2017.12.04, i.e. already passed. David 
> has asked me to document what to do about it so sites can update. So, please confirm that the factors 
> necessary for this error to happen are:
>
> a) It happens where  one system sends proxys to another to be authenticated by bouncycastle.

Not quite. It happens when a certificate chain containing the 2B CA certificate is sent across (either from server to client for server certificate validation, or  from client to server for client certificate validation) and is validated with bouncycastle against the locally installed trust anchors.

> b) The systems must have different versions of the CA certificates; one must 
> have 1.87-1 and the other must have 1.88-1; it doesn't matter which 
> way round, they just must be different.

Correct.

> c) The system that is authenticating incoming proxies (e.g. CREAM or ARGUS, or 
> whatever) must be on bouncycastle-1.46-1 (unpatched version).

Again, not quite. This has nothing to do with proxies. It's any system that uses an unpatched bouncycastle to
validate a certificate chain it receives from the  remote side against it's locally installed set of trust 
anchors.

> d) The system that is authenticating incoming proxies must have a CRL for 
> the UK CA cert (UKeScienceCA-2B) that contains no "non-critical extensions".

It must have a CRL for any UK eScience CA (root or 2B) that contains no extensions at all.

To summarise, all of the following has to apply to trigger the problem:

* it effects the different versions of the UK eScience 2B CA as installed by the trust anchor releases 1.88 and 1.87 (or earlier)
* both sides use different versions of the 2B CA (installed by a trust anchor release, installed in the browser, etc)
* a certificate chain containing the 2B CA has to be transferred from one side to the other
* the side receiving the chain uses an unpatched version of bouncycastle to validate the received chain against 
  a local installation of the trust anchors
* a CRL without any extensions issued by any of the CAs in the chain is present for the local trust anchors

The above can apply to a server, a client, or both.

> If any of those things is missing, then the error doesn't happen. So fixes are:
>
> 1) Jens is sending a CRL with a "non-critical extension".  A site could 
> wait until the CRL propagates around the sites. (Jens, pls confirm when you "know" 
> this actually works!)

This might be needed for all CAs in the chain. I'll test it again once Jens issued an updated CRL for the Root CA.
It's possible that having it in the CRL of the  root CA is enough, but I won't know for sure until I've tested it.

> 2) Use a better version of bouncycastle. One way on ARGUS to get that it 
> is install it with Centos7, and perhaps UMD3 or 4. I haven't tested this, 
> but Chris Brew assured us that the problem doesn't come out on Centos7 
> (Chris, please confirm version of BC on your ARGUS server).

You can get Argus 1.7 for SL6x from UMD 4, but not from UMD 3. Argus 1.7 ships with a newer version of BC 
which doesn't have the problem.

> 3) If you use ARC (sj: typo, I meant ARGUS) or CREAM, you could update bouncycastle to the "Robert Frank" 
> patched version (details TBD) and then just update everything 
> to 1.88-1. This has been tested and seems/is safe.

Correct. If you use CREAM you can do the same, you just have to install the patched version on the CE as 
well (has been tested in Manchester).


Also, all services that use the CANL library to reload CA certificates and CRLs automatically need to be 
restarted after the update to 1.88.

Cheers,
Robert