Sep 212007
 

Part of a series on Sun­’s OpenID@Work ini­ti­at­ive; see the intro­duc­tion for more context.

Data gov­ernance is the term used for know­ing what hap­pens to the data that is stored, par­tic­u­larly when that data has any PII (per­son­ally iden­ti­fi­able inform­a­tion), which the Open­ID IdP does. Using Open­ID isn’t the reas­on we keep this inform­a­tion; any regis­tra­tion sys­tem keeps at least some inform­a­tion about the people who have accounts on it, even if it’s only a name, email, and pass­word (or open­id iden­ti­fi­er). I thought it might be use­ful to oth­ers to see some of the basic steps that we went through when dis­cuss­ing how to pro­tect that PII, and some of the decisions we made on what data to keep and what not. If you’re set­ting up a regis­tra­tion sys­tem your­self, you may make com­pletely dif­fer­ent decisions, depend­ing on what inform­a­tion you’re keep­ing and what your regis­tra­tion sys­tem is being used for.

Obvi­ously, step 1 is to make someone respons­ible for fig­ur­ing it out. In our case, that per­son was me, with the grand title of “Data Stew­ard” in Sun­’s pro­cess. Yes, there’s a pro­cess to be fol­lowed and check­lists to be filled out, and people whose job it is to help us fig­ure it all out (the Chief Pri­vacy Office with Michelle Dennedy and her team). What you need to do is:

  1. fig­ure out what data you need to have, wheth­er for tech­nic­al or policy reasons
  2. fig­ure out who will need access to the data
  3. fig­ure out how to pre­vent people access­ing the data who don’t need access
  4. fig­ure out when you can des­troy the data
  5. write the decisions up and make the inform­a­tion available
What data needs to be kept?

In this ser­vice, people can use fake names, but often choose to use their real ones. For com­pli­ance reas­ons, in case there needs to be an invest­ig­a­tion into an alleg­a­tion of wrong-doing by a user, we need to keep the employ­ee ID that was used to sign up for the open­id iden­ti­fi­er. Even after the open­id account is closed, the inform­a­tion is kept for a set peri­od of time to allow any prob­lems to sur­face. Yes, the users are warned about this dur­ing the regis­tra­tion process.

The web serv­er logs are in the Com­mon Log Format, which includes a record of the HTTP GET request from the con­sum­ing site (rely­ing party) ask­ing for authen­tic­a­tion of the open­id iden­ti­fi­er. This HTTP GET request includes the open­id iden­ti­fi­er and the site’s URL, thus allow­ing cor­rel­a­tion of who went where (though not what they did after log­ging in). This hap­pens with every Open­ID Iden­tity Pro­vider that has web serv­er logs, which I would guess is basic­ally all of them, so it’s cer­tainly not a prob­lem that is spe­cif­ic to Sun­’s ser­vice. Every Open­ID IdP could per­form such cor­rel­a­tions about their users. This is not neces­sar­ily a prob­lem, and some people would say that allow­ing people to see that this open­id iden­ti­fi­er was used in dif­fer­ent places allows repu­ta­tions to be built, but it also has pri­vacy implic­a­tions. I might not want my employ­er (or any­one else, for that mat­ter) know­ing what sites I vis­it, how often, and when. So on prin­ciple we mask the data, so that we can see how often a site is vis­ited, but not who’s doing the visiting.

Who needs access to the data?

If there is an alleg­a­tion of wrong­do­ing on the part of a user, then Cor­por­ate Com­pli­ance may need access to the inform­a­tion about whose open­id iden­ti­fi­er it is, and access to the web serv­er logs show­ing wheth­er the user actu­ally did log in to the web site in ques­tion. This data is only passed on after review of the alleg­a­tions by Sun­’s leg­al team.

Apart from that, sup­port per­son­nel need access to the open­id accounts to help people with things like for­got­ten pass­words (if they for­got to set a secret ques­tion), or delet­ing the account on a vol­un­tary basis. The user has to file a sup­port request using Sun­’s intern­al sup­port sys­tem, and the employ­ee ID of the per­son fil­ing the request has to match that of the own­er of the account. 

Engin­eer­ing may need access to some of the files for debug­ging. There is also a script that runs over the web serv­er logs and extracts records of which sites were vis­ited and when, dis­card­ing all inform­a­tion about who the user was who vis­ited that site.

Restrict access

Only a few people have access to the accounts; sup­port, engin­eer­ing, and me as data gov­ernance stew­ard. That access is con­trolled through oper­at­ing-sys­tem access con­trol. The same applies to the logs and every­one who has access has gone through train­ing to ensure they know the pri­vacy con­di­tions apply­ing to the use of the inform­a­tion (i.e., used only for debug­ging or sup­port once the user­’s iden­tity is veri­fied, as above).

As a side-note, to log in to my account on the machines, I have to log in to Sun­’s intern­al net­work, ssh from there to the machine I want to access and then log in with my stand­ard Sun cre­den­tials fol­lowed by a one-time pass­word that uses a chal­lenge-response mech­an­ism with a secret pass­phrase. Then I need to su to the appro­pri­ate user account, using yet anoth­er pass­word (of course).

Des­troy­ing Data

Once an account has been deac­tiv­ated, either because the employ­ee left Sun, or because they asked for it to be deleted, it remains inact­ive for 6 months. Once that time has passed, the account is deleted. The web serv­er logs are deleted auto­mat­ic­ally after 6 months. This time was chosen as it seemed to meet both the pri­vacy prin­ciples (delete as soon as pos­sible) and the cor­por­ate com­pli­ance prin­ciples (keep around for a reas­on­able length of time, just in case it’s needed).

Doc­u­ment­ing

Once it was all figured out, and reviewed by the pri­vacy spe­cial­ists in Sun, doc­u­ment­ing it was the easy part (just like writ­ing stand­ards, really, com­ing to the con­sensus is the dif­fi­cult bit). So we have inform­a­tion in the dis­claim­er that people need to agree to when they sign up for an account, the user policy, the FAQ, and the more form­al check­lists etc are avail­able from the Sun-intern­al pro­ject site. And people can always ask me, or email one of the mail­ing lists we have, if they have any questions.

  2 Responses to “Sun’s OpenID IdP: Data Governance”

  1. The PII, is that encryp­ted on disk? I’m think­ing of the clas­sic case of the data own­er tak­ing a copy of the data­base home and hav­ing the laptop stolen with all the plain text PII?

  2. Nobody takes the user data­base home or trans­fers it to any laptops. Any admin or sup­port work that needs to be done to user accounts is done via a web inter­face so there’s no reas­on for any­one to put the data­base on a laptop and take it any­where. The (disk-based, so no tapes to lose) backups are also stored in a secure location.

 Leave a Reply

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>

(required)

(required)

/* ]]> */