Setting up multiple accounts

cycomachead · August 12, 2019, 10:43pm

Really?

Is a CSV upload function that creates accounts not enough for now?

Maybe I should have phrased that question not as "Why not do this" -- but more of "Have we tried this?" Because I can tell you that a number of teachers have done that. Though I'm not aware of any of them contacting us - I just know what data is in the database.

cycomachead · August 13, 2019, 1:07am

Sorry, I missed this, but no - I never said we had a bulk creation tool ready. What I did say was that people could do it themselves.
I said we could build an API very easily, but I haven't done that because BH essentially said no to the idea of an API without thinking about how the cloud works and suggesting on options like PDFs and sharing stuff. I am mostly frustrated by offhand dismissing discussing known and understood options like OAuth or LTI or LDAP -- though it's hard today, knowing where we might want to go makes sense before we build a feature that randomly restricts the format of usernames or passwords.

But if you want an example about what works today, here's how to do it in JavaScript.

// Bulk create Snap! Cloud Accounts
// Not tested, but a close scaffold.
const fs = require('fs');
const crypto = require('crypto');
const https = require('https');
const JSON = require('json');
// https://www.papaparse.com
const Papa = require('papaparse');

const data = fs.readFileSync('users.csv')

// CSV should contain a header row
// username,email,password
const userData = Papa.parse(data, { headers: true });

// Loop over our CSV data, and generate a request to Snap!.
userData.forEach(record => {
  // An example/guide: https://www.nodejsera.com/snippets/nodejs/sha512-hash.html
  let hash = crypto.createHash('sha512');
  data = hash.update(record['password'], 'utf-8');
  passwordPreHash = data.digest('hex');
  // Build the data for the request
  // TODO: probably need to encode this.
  let requestData = `email=${record['email']}&password=${passwordPreHash}&password_repeat=${passwordPreHash}`;

  // Make an actual request to Snap!
  let req = https.request(
    `https://snap.berkeley.edu/api/v1/users/${record['username']}?${requestData}`,
    { method: 'POST' },
    (res) => {
      res.on('data', (chunk) => {
        console.log(`Snap!: ${chunk}`);
      });
      res.on('end', () => {
        console.log('Done.');
      });
    }
  );
  req.on('error', (e) => {
    console.error(`problem with request: ${e.message}`);
  });
  req.end();
});

cycomachead · August 13, 2019, 2:18am

Here, this thing actually works:

Give it a CSV with columns username, email, password

Aside from the fact that the error feedback is terrible and it's Snap!, and not a dedicated webpage, and you need to log out -- this is pretty much functionally equivalent to what we can build.
(And a real bulk endpoint would probably require you to be logged in, but that's a separate issue just to prevent spam.)

bh · August 13, 2019, 6:54am

No, that's just what we don't want to do. I don't care what teachers want, in this matter. I think our policy should be that we will not ever under any circumstances allow a school to send us a spreadsheet (or anything else) with student information on it. If a teacher tries to do that, we bounce the email. If a teacher sends us anything that looks as if it refers to human beings, we cancel their teacher privileges.

The only thing we ever allow a teacher to send us is one positive integer. Then we send them a csv with that many logins and passwords. If the teacher can't figure out what to do with that, let them ask their IT person. Let them do a join of our (trivial) database with their database, on their own computer.

Not at all. A submission is a project. Yes, we store projects. :~) If a teacher attempts to attach a grade to a submission on our server, we cancel their teacher privileges. If they want to grade students, they do it on their computer. And then they worry about FERPA.

Maybe I'm missing something, but if the 20 students all turn in solutions to the same assignment, they're pretty much all going to be identical.

This is a contradiction in terms. "dead simple" means no user choice. "Users choice" means incredibly complicated. I think "dead simple" is the way to go.

bh · August 13, 2019, 6:58am

What? How did they do that? Store student information on our server? Surely our database doesn't have fields such as name, does it? All it should have is username and password, period. If teachers are making usernames that include real names, we should revoke their account.

bh · August 13, 2019, 7:01am

If that's currently possible on our system, please figure out quickly how to make it not possible.

cycomachead · August 13, 2019, 8:08am

You do realize that these such things mean management work and features that don't even remotely exist right? That's why I am pushing back here. "Teacher privileges" means we know who a teacher is or that we have some kind of account type system, which we do not.

Also, can we really expect such a feature like this to be something only 3 or 4 of us can do? Certainly that works a little while, but when it's the start of school season and we have dozens of people emailing us that probably won't be so great.

Really? That doesn't work for every teacher and clearly if schools / users are OK with the choice of using personal emails I don't see why we shouldn't support that.
The act of joining may be "trivial" but if you're dealing with different assignments, it does mean you need to keep doing things over and over. Plus, the problem with sequential / numerical data is that it's not obvious to spot errors. I know plenty of people who have joined spreadsheets on student IDs and only notice mistakes when some other data like emails and names or something didn't line up.

You underestimate the solution space of even simple exercises
Plenty of assignments do, and should involve creative work! I honestly don't get why you don't think class projects would be more personal to students. It's arguably one of the best things about CS10 and BJC. The projects that we love to share are class assignments where students have gone and built something unique.

Obviously, the rules around this stuff get tricky, and IANAL, but the second you start calling something a "submission" or an "assignment" the game changes. The Snap!Cloud then goes from a system that students can use, to one they must use. While I certainly want students to use the Snap!Cloud (for many reasons, not just convenience), there isn't anything about BJC or any course using Snap! that requires you to need a cloud account. You can do all your work without ever signing in. If a teacher accepts submissions through the cloud or we start having classroom support that changes. Now, I do think those features are useful, but they are pretty incompatible with limiting PII.

Dead simple means what we chose to build not what that functionality allows. Simple tools can be more functional than complex ones. Unix philosophy and all that.
What I mean is very simple: take input and send it somewhere. More complex is building systems that do validations on that input. Most complex is building systems that generate that input. That's why what I was trying to propose was just CSV → accounts. Because that we can build. That is flexible enough to adapt to different needs without rewriting any code.

Look at that Snap! project: What is does is quite simple, but also quite flexible because you can very easily swap out parameters and whatnot.

Teachers went ahead an either told users to create accounts using their email address or manually created 30 accounts. I don't know which they did, but they did it a while ago.

There are email addresses which have 200+ accounts with unique usernames -- they're not student names for the username that I saw, but they are very clearly accounts belonging to a teacher in a course. So are typical usernames, some are student IDs. Some of them might even exist from before the migration.

The Snap!Cloud requires a username (a string that can be anything) and an email address (also basically can be anything...sigh.) for every account. We're not in a position to be able to remove either of those constraints -- though it is theoretically possible.

Look here: Metabase
(Note to others: This is an internal Snap! team tool.)

Also, the source code and API are public. Anyone who's done some web programming would be able to figure this out if they tried. I mean, I help build the software and I just had to re-read the API docs and peek at the code to see what was happening, it wasn't like I used any special knowledge to build my example project. Maybe one of those teachers has their own script? They teach computer science...they're certainly capable!

What?! I don't know what you want to make not possible. How do you propose a single user sign up for a Snap! account? We can't prevent people from signing up. That's literally what my Snap! project does. It is the equivalent of a using the "create account" form. But it's a program, not a human. So it can do that process quickly.

What do you propose we do? A captcha?! Nope. Terrible accessibility, and Google tracking/AI... Rate limiting? We'll end up preventing a whole class from signing up at once. We could make this process harder, but it makes it harder for that 99.9999% of legitimate users of Snap! too. Sure there are distributed rate limiting tools, and ways of handling legitimate activity, but those are difficult to setup, and usually somewhat costly to support.

Brian, you're telling me things that I don't think you're fully thinking through. If a

Also some other related thoughts around emails, which are definitely a separate feature, but right now < 5% of our emails bounce though that's technically higher than it should be. I would estimate that 30-50% of that (1.5-2.5% of total emails) are due to schools blocking email addresses. The other 40% (2% total) is due to typos / bad email addresses / disabled accounts. The remaining 10% have cryptic error codes I can't interpret yet...

There are 8 schools in the past week that I know have email systems that blocked us. 7 of 8 use Google, and 1 uses Microsoft.

So, what I talk about login integrations, the point is not that they are immediately simple or straightforward. The point is that I'm trying to understand who are users are and what they are using in their classrooms and what are the many possible tools that might help them. I am not saying we build or commit to anything right away or that this even prevents us from building our on bulk accounts feature.

Sometimes the complex work is an investment in that it literally saves time in the long run and could work for lots of schools, and maybe it saves us from building other tools. I'm not arguing to build these integrations right now, I'm arguing that we stop dismissing things offhand without understanding the details of what's happening.

And of course, this is an example. It isn't the only other option -- there's teacher/parent accounts, there's linking, etc other complex features that we should understand before inventing something.
We complain about how Silicon Valley doesn't consider how their software can be abused. Or that they don't build things users need. We should be very intentionally designing the software we are building and putting out there.

cycomachead · August 13, 2019, 8:29pm

@excite I am sorry we got off track, but I am genuinely trying to solve the problem. I was mostly arguing with BH about details.

As a proof of concept, does this Snap! project work? Snap! Build Your Own Blocks

Also I kinda brushed off LDAP by saying it's too hard...aside from the fact that it would be more work to setup, what do you imagine the user flow would be to sign into Snap!? Would students need to select their school first? Also, what would an LDAP login return to Snap!? You brought it up as a possibility, so I would be interested to hear how you'd like that to work.
Also, what makes LDAP different from Google SSO? We have had a number of requests for Google from users who would be able to use it -- so I'm just trying to understand if that's some local policy or something technical or what. I'm sure there's many things that can be different...but what matters to you in this case?

bh · August 13, 2019, 10:31pm

LDAP is, you know, a standard. Google SSO is surveillance capitalism.

I had thought that we secured logins and account creation so that only Snap! can do them. If users can roll their own bulk account creation, then pornographers will do so.

Okay, I guess I see that it's not simple either way. But our target is no student SSI in any form. (Yes, term projects are an issue, I get that.) What if our ToS prohibit uploading SSI but we don't enforce it unless it's called to our attention?

I do think that if we provide a simple bulk account tool, people will use it rather than roll their own.

I guess I'm convinced that we're already in deep trouble. I don't know what to do about it, except that exacerbating it by involing third parties can only make things worse.

Let's arrange a phone meeting. Mon?

cycomachead · August 14, 2019, 2:51am

Really, this isn't helpful. Google SSO is based on OAuth2, which is a standard and it is configurable as to how much information each party shares. I can certainly understand why a school would not choose Google, but what I don't know is why, if a school already has Google, they would use a secondary authentication system. There's probably a reason, but I just don't know it.
It's also worth mentioning that G Suite's OAuth is in many cases just a wrapper around another auth system -- like at Berkeley.

G Suite uses OAuth which then takes you to a Shibboleth (edu-focused identity server that uses SAML as the protocol) service, which talks to a CAS service which now talks to DUO that then sends a request and that approves you login in the CAS service and that then calls back to Google which using OAuth redirects you back to where you came from.
Google's piece (OAuth), Shibboleth and CAS are all open standards or software. DUO is a private company and I don't know if it's an open 2-Factor protocol, but I don't believe it is. Though, it is very educationally oriented.

Which is to say, authentication needs can be ridiculously complex when and why certain systems are used is dependent on each school. But also, in the case of big systems, there's lots of flexibility, and most of it is using industry standard protocols.

I mean that's the way the web works. Everyone is basically anonymous -- we don't know if a request comes directly from Snap! or not. Now there are CORS restrictions. Those apply to browsers, but not to things like CLI tools. This is why I built a Snap! project and not a Google Sheet. You can't make a request from a Google Sheet to Snap! normally, though you can use a proxy.

Sure, but it's really really stupid/sad/disappointing to tell kids this!! I am so frustrated by the fact that we are trying to build a technology which is inviting and fun to use and highly engaging and we are trying to find ways to make it OK to take all the enjoyable bits away. I am serious, it's disappointing to hear that we will be OK with solutions/rules/whatever that make it harder for kids to have fun!

It's not just term projects -- we have plenty of people that make games or modify homework to make them personal. Snap! has features like webcams and microphones that inherently encourage something personal. Now most of this data is not immediately identifying - it's not often exactly names or emails, but it is often something that you could use to identify something. Think about how kids make projects about where they grew, what their hobbies are, or the things they struggle with. Think about the examples Mitch loves to share every year at Scratch conferences. Imagine what it would be like if those usernames narrowed down the author to 1 in 8 or even 1 in 30 students!

If my username is cycomachead -- well that could be anyone online. (Except, obviously my name is attached to that one...), but if my username is MrsWeiss-Alaya-HS-2011-2 and I built a project about photography or computers, you would have a good chance of narrowing that down to me or maybe me and 1 or 2 of my closest friends. (And this is the other ironic thing about names.... using my first name (Michael) and something generic anywhere online is 1000x harder to pin down as me than my own unique username. Which is also why I use different usernames on forums where I really want to be anonymous.)

Plus, the rules around this are nuanced -- PII data is a legal problem when it's for a school project, but not a personal project. It's also only a legal problem when uploaded to the Snap!Cloud. That same data is not a problem if turned into the school's existing software systems.

Look, as long as we maintain our own login system we need something personally identifiable. We can sort of try to work around that by having us pseudo-identifiers like emails that are shared, but even then, it's still a valid email address.

With a 3rd party login system (Google, some LDAP systems, SAML providers, LTI, other OAuth tools, active directory, etc. -- even FB...) we can get back as much of little user detail as we need. But what this all have in common is that we can get back a unique, opaque ID. Look at this google example:

id: 10769150350006150715113082367

Instead of storing some email address, we store that id. Then when they sign in again, we look up that id. Theoretically, we could even use this to allow empty usernames a replace the user name field with anything we chose.

In this scenario the only thing Google (or any other identity provider) knows is that I made a login, and that the login was for Snap!. But once inside the app, Google or anyone else won't know anything. And other than the fact that this person came through via Google, we know even less about them than we do now.

It's also worth noting that the unique identifier doesn't always have to be something private. For example, the default ID an application that uses CalNet will store is the UID. For me the UID is 952109, and you can find that out by searching directory.berkeley.edu. Applications storing a UID is preferable to the student ID, which is not public, due to the FERPA rules about directory information. So for some systems the best "legal" path is to store something which may not be directly identifiable, but something you can very easily look up.

The point in this example is that there is no prefect world. I'm not saying we go and build custom integrations for every school. I'm not right now even arguing that we have to do Google. Though I think Google probably makes sense, I'm mostly arguing for the position that we stop dismissing things off hand because they are unknown or seem difficult. Our simple to implement solutions also have their own challenges. Maybe those challenges are down the line, but I want to. make sure we are aware of them before we reinvent the wheel half a dozen times. There is no free lunch here.

(Aside: the personal examples I used a factual information about me. Though, that info is already online in places so I'm not revealing anything. But it could be, if that info weren't already out there.)

bh · August 14, 2019, 6:23am

You don't think surveillance capitalism is a thing? Or you don't think it's a problem?

I don't want to be having this argument. Yes, you've convinced me that projects may contain problematic information. I am hoping that can be minimized by establishing that class accounts are just for class work, and fun stuff goes in your personal account.

Maybe the right thing is just to post prominently in the ToS that information on our server is not guaranteed safe against disclosure and that sending us student PII is a FERPA violation, and leave it at that.

cycomachead · August 14, 2019, 7:54am

No; surveillance capitalism is a thing, and that is a problem. I don't think that OAuth is a part of that, though. It is materially different than search and advertisement options. Or if so, I'd like to understand why. Yes, there is a log of who logged into what, but given that anyone can create an application, I think that info is a of very little value to a company. Certainly, this data is essentially nothing compared to what's in email inboxes.
However, school administrators can choose when/what to allow Google logins and there's some security advantages for schools by using a central system.

The reason though that I say it's not helpful is this: When you make a statement "Google SSO is surveillance capitalism", I don't know what the means for Snap!. I mean, I understand the literal meaning, but I don't know where to go from there. You already said we might do it earlier in this thread and elsewhere... I don't know if this means "no" or "I just don't like it, but we need to do it" or if you're expecting SSO to do something that it does not.

When we have had teachers tell us the Google SSO will help them in the classroom, I want to understand what part of Snap! becomes incompatible with that.

Multiple accounts are definitely an option, we just need to think about how that process works. There's complexity there and requiring multiple accounts tends to punish the most interested students. It may legally help solve lots problems, and it may be better than some other options, but I think this is hard to do right. I mean, it's hard to do the user friendly thing well, it's not so hard to tell people to just have 2 usernames and password and "figure it out".

bh · August 14, 2019, 4:55pm

If Google thinks it's worth their while offering a third party login service, it's because surveilling people's logins helps them sell advertising. TANSTAAFL doesn't apply as a universal principle (take BJC for example) but it does apply to everything Google does.

Sigh. I am understanding your point of view better now that you're saying "we have to do this complicated thing to make teachers happy" instead of saying "we should do the simple thing namely ..." followed by some enormously complicated thing.

People can change their password so as to have their class account password match their fun account password. That doesn't bother me.

We need a f2f discussion of this. I mean, yes, I'm arguing for making things a little harder for teachers (by not integrating with their CMS) in order to make things a lot simpler for us (by not making any promises about security).

So, okay, if a teacher writes a program to generate accounts one by one, they can do it however they want (but we need a capacity limit, I still say, to defeat pornographers). But if they want us to do the account generation, they send us a positive integer, and they do the integration with their CMS. We wash our hands of it.

I guess what you have to convince me of is that there's some benefit to us big enough to involve Berkeley's lawyers in making security promises about things like third-party login.

cycomachead · August 15, 2019, 10:20am

Sure it applies to BJC. Using an external system, having secondary logins, etc. is all more complex for users than if we built different things or if some school just went ahead and bought a more drop-in curriculum.

But also, I'm not claiming Google gets no benefit from SSO, but the data is rather limited and the existence of SSO doesn't require anyone to use it. (I mean, also consider the high chance that students probably Google'd Snap to find the site anyway...)

I'm sorry I'm not being clear. But I'm still not saying we "have" to do anything. I am saying we shouldn't dismiss things that appear complicated. I don't want us saying "We can't do Google auth because it's hard and gives Google XYZ access" when I don't think any part of that sentence is true. If we decide not do something we should at least make that decision for the right reasons.

Also, while the Snap! cloud is not the same, I've built things that use Google Auth before, or other "Sign in With X" tools, and in this case it's not nearly as complex as it might seem. More complex than bulk account stuff, probably, but I actually think the user interface is far easier.

I'm explaining the details of how OAuth and SSO work because they do matter. But for the most part, we do not need to implement them. Even Bernat would agree that you don't need to roll you own security.

Let me put it this way: If I am the instructor, and I had options, I would look elsewhere if I believed I would end up with a bunch of students who have multiple accounts. Snap! does not exist in a vacuum. It's not the only tool that gets used. The reason kids / people have problems remembering their passwords is because they have too many darn passwords! Fortunately, we don't inflict any ridiculous password rules on any students...

I think that's pretty reasonable -- though if you look at the query I sent, determining a reasonable limit will be hard. (And, FWIW, that scenario you describe is I still think more work than a login but it's not terrible.)

I genuinely don't understand why the lawyers need to be involved. We aren't making promises about anything. They don't review things like our use of AWS or DigitalOcean do they? I would pay a lot of money to be able to tell CS10 students they can log in with their Berkeley accounts in the Spring.

The benefits to us of SSO (Google, CalNet or otherwise):

one click entry into Snap!.
No need for verification / sending emails to accounts that come in via SSO
Students still manage their own accounts, so there's much less need / concern about who has access to what.
- For school-managed accounts we can tell districts that all their exotic password rules and access controls and whatnot still get to apply.
a very straightforward implementation path -- I don't want to use the word "simple". But I will say that it is well understood.

Anyway, yeah, we should meet soon. Sometime next week?

bh · August 16, 2019, 7:58am

Oh God you should never roll your own security! That and floating point are the two things I used to tell 61A students not to try to do themselves in real work situations.

I think your view of what's simple or not is warped by your day job. :~)

Thanks for this clear exposition.

But... This, writ large, is the central problem of surveillance capitalism. The way they get people to buy into it is that it's so damn convenient to let Google run your life for you. (I've been telling people for a while now that when I can have a self-driving car, that will be the point at which I sell my soul to Google. Some people have lower bars.)

I'd like to talk about verification. I was shocked to learn, here, the other day, that if you save a project you don't have to validate the account. The point of the verification email is to prove that you own the email address, not that you exist. It's how we keep pornographers who buy lists of email addresses from using them to create zillions of accounts. I don't think SSO eliminates this need, either. SSO proves you've made an account somewhere else. It doesn't prove that the email address you gave us belongs to you. (And we do need an email address, for when someone violates the TOS, or when someone's parent wants us to delete the account.)

We're already getting letters from school districts asking us to sign contracts in which we promise FERPA compliance -- I know you say it's the schools that have to comply with FERPA, but they do so by passing the pain on to their suppliers.

Of course such contracts would make sense if we were charging money. If we were a company instead of a university, we'd have our own lawyers who specialize in FERPA and COPPA and GDPR. But as it is, technically we couldn't enter into a contract even if we had permission from the university, because there has to be consideration on both sides to make a valid contract, and we don't get anything from the school district.

Sure. Any afternoon except Monday would be great.

bromagosa · August 20, 2019, 9:03am

Can someone write a super short summary of what's been talked here? Are we adding SSO? Are we creating a form for teachers to create multiple accounts at once?

cycomachead · August 20, 2019, 10:25am

tl;dr: A simple form for bulk account creation is what some teachers want. (Others have asked for Google SSO). Both solve the problem of students getting into Snap! more quickly.

Whenever we talk about creating accounts, we also talk about PII and FERPA and classroom accounts and linking stuff. And for those things we have lots of ideas but not great solutions.

Brian and I are gonna meet Thursday... My gut right now says there's room for both SSO and bulk account creation. They solve different problems for different groups...but at least the basics are relatively easy and understood there.

And doing both those are easier than classes stuff, but neither really completely solves issues of personal info or issues of FERPA. So... I don't know how we are really supposed to deal with that stuff.

Oh and the laws around PII are convoluted and dumb and we all have disagreements about what is and isn't OK and none of us are lawyers.

cycomachead · August 20, 2019, 10:26am

Also, I have replies to BH's latest post, but not sure if it's worth continuing here or not.

bromagosa · August 20, 2019, 3:46pm

Thanks for the TL;DR! I'll try to be short because I don't want to add too much fuel to the fire

I think you all know my opinion on allowing EvilCorp* sign-in. I know we're not telling people to make an EvilCorp account, just letting them use their existing account to log in, if they already have it. But, in any case, this kind of implies we're legitimating the use of EvilCorp by students. It kind of says we're okay with educational institutions telling their students to trust EvilCorp.

I wish Mozilla had succeeded in their Persona initiative. It seems they're considering making Firefox Accounts available for other sites to use, but right now you can't do that

Aren't there any other truly ethical login services we can use?

What about a QR+phone login system? Does anyone know of something like this that is not just a proof of concept? (it doesn't sound crazy to implement a similar thing from scratch...)

(*) read Google, Facebook, Twitter, Microsoft and probably even GitHub nowadays.

bh · August 20, 2019, 5:33pm

Right, I think the ethical issues around SSO should be postponed by not doing it now. We can argue about it in Heidelberg. Meanwhile we do Dirt Simple Bulk Account Creation (the DSBAC protocol :~) ).

But I'm thinking about other things until meeting with Michael Thurs.