- Chris Kurtz, System Architect, Arizona State University
- First Google Apps for Education customer
- Multiple campuses with a diverse IT infrastructure
- Large # of governing reqs: FERPA, HIPAA, DARPA, DoJ, NASA, JPL
- Splunk is an Enterprise-level product, with easy access to all departments inside the University Technology Office (ISO/InfoSec, Ops, Dev, BA/BI, Accounting, Netcom, etc.). We wanted everyone to have equal access
The Power of Splunk
- Is ASU’s universal aggregator of all machine generated logs
- Typical response time to incident without Splunk: multiple days.
- With Splunk, we have direct, immediate access…minutes!
Splunk and ASU
- Had it for 4 years now.
- It needs a lot of power to run properly
- Use enterprise search head clustering and security
- Licensing 1TB/day
- Growth slowing down because we’re learning to better filter data
- Admissions and payroll are beginning to use it
We Didn’t Know!
“It was like the invention of the microscope: we didn’t know what we couldn’t see” – Martin Idaszak, Security Architect, ASU
Use Case: Protecting Direct Deposit
- Changing EE info online is great, but a target for hackers
- ASU has international students, faculty and staff, just blocking other countries isn’ accessible
- Before Splunk: whenever an EE was missing a direct deposit check, the investigation would take days, during which time it would sit between HR and Payroll systems. We were hand-protecting only a handful of people’s paychecks.
- With Splunk, we check geo tag info, do an affiliate lookup, and put it into an unusual changes report which payroll checks.
- Payroll will not run the payroll job WITHOUT this report now.
- This is the most valuable data I have in Splunk, by far.
- Where do you change your direct deposit from? Home and work. We take advantage of the “user’s center of gravity” to make a determination if the request is unusual.
- False positives? YES. False responses? NO.
Use Case: Phishing as a Teaching Tool
- We have 100K users. In 2015, we received 1 billion email messages, more than 750 million were spam and phishing.
- We have students from all across the world, transient by nature, can’t assume traffic from Nigeria, China or Malaysia are hacking attempts. In fact, it’s probably legitimate!
- Some Indian students were forced by their parents to give them their login credentials, which resulted in some interesting traffic and double-logins from completely different areas! We ended up setting up special limited accounts for these parents.
- Do NOT store user emails in Splunk, only the headers that transit our system.
- “This is the best tool we’ve seen in 10 years” – Jay Steed, AVP for UTO Operations, ASU
Leveraging Your Custom Data
- It’s limited if you’re only reading logs.
- If you don’t understand context of your data sources, you won’t get as much as you can get out of the product.
- No schemas! No types! Eval is your friend.
- Combine all data types in any way you want, on the fly.
- “Think of it like a database where time is the primary key”
- Don’t limit the power of Splunk!
- Start using the Common Information Model now!
- Not formatting data limits its value. Pull in secondary/ancillary data that makes sense of data in your logs. Makes the field extractions more valuable.
- For ASU, the master datasource is the Data Warehouse. Affiliate ID is the unique ID.
- Isolated Splunk server running Splunk DB Connect (DBXv2) runs SQL queries on several databases, and writes a series of lookup tables (with the Affiliate ID) every 4 hours. Linux ionotify monitors the lookup tables, and on write-close copies data to production systems (sanity checking applies).
- Heavily invested in Splunk because it solves many of our outstanding problems.
- 1st round of data onboarding concentrated on needs of ISO office
- 2nd round focused on operations needs, with some interesting use cases thrown in as they appear
- 3rd round is expanding Splunk usage and bringing it to the enterprise
- Splunk’s savings in man hours, extreme flexibility, use to validate other systems, and goals to replace antiquated systems has very much paid off
- Get your data into Splunk!
- Modify it later.
- Use the people who “get it” as evangelists
- Don’t get caught up on “use cases.” Once you have the data in Splunk, use cases present themselves repeatedly. Think of it as use case on demand.
This is actually a follow-up to one of my recent posts about a webinar I attended by Unicon on learning analytics. We have representatives from three different LMSes: Moodle, Sakai, and Blackboard. Looks like Lou and Josh from that webinar are here…I’m looking forward to learning more about this effort! Word of warning: they moved fast, so I missed some detail, particularly around the workflow and data-heavy slides. My Student Affairs colleagues will want to tune into the question I asked at the end…
Open Learning Analytics: Context & Background
OAI, or the Open Academic Analytics Initiative: EDUCAUSE Next Generation learning Challenges (NGLC). Funded by Bill & Melinda Gates foundations, $250,000 over a 15 month period. Goal: leverage big data concepts to create an open-source academic early alert system and research “scaling factors”
LMS & SIS data is fed into a predictive scoring model, which is then fed into an academic alert report. From there, an intervention is deployed (“awareness” or Online Academic Support Environment – OASE)
Research design: rolled out to 2,200 students in 4 institutions: 2 community colleges, and 2 historically black colleges and universities. More detail on the approach and results here.
Strategic Lessons Learned
Openness will play a critical role in the future of learning analytics.
- Used all open source tools: Weka, Kettle, Pentaho, R, Python, etc.
- Open standards and APIs: Experience API (xAPI), IMS Caliper/Sensor API
- Open Models: predictive models, knowledge maps, PMML, etc.
- Open Content/Access: journals, whitepapers, policy documents
- Openness or Transparency with regard to ethics/privacy
- NOT anti-commercial, commercial ecosystems help sustain OSS
Software silos limit usefulness
- Platform approach makes everything more useful
NC State Project
- Getting everyone moving in the same direction is a challenge.
- The number one priority we have at NC is student success, and we know that data is going to help us get there. However, we have different vendors approaching us independently, each with their own selling points on what they could do to help us.
- Lunch and learn sessions, bring people up to speed on what questions to ask, and start thinking about who can generate answers. It took us 10 months to get everyone together
- Division of Academic & Student Affairs has purchased EAB; concurrently, we’re working on LAP. Continued conversations with campus partners will have to happen.
From Proof to Production: Toward Learning Analytics for the Enterprise
- Initial steps: small sample sizes, predictions at 1/4, 1/2, 3/4 points in course, multi-step manual process
- Goal 1: make it more enterprise-y. Use large sample sizes (all student enrollments), frequent early runs (maybe daily), automatic no more than 1 click
- Currently in progress: rebuild infrastructure for scale; daily snapshots of fall semester data; after fall semester ends look for the sweet spot.
- Future goals: refine model even more; segment model by population; balance between models and accuracy; refine and improve models over time; explore ways to track efficacy over time; once we intervene we can never go back to virgin state
- Why is JC seeking LAP implementation? First time pass rate of Anatomy and Physiology is 54%. Only 27% re-take. 37% non-persistence rate (DFW). Need to find ways to help students succeed.
- How is it going? We have a 4 year grant. Compliance letter came in May of 2015. Implement PREP program in October 2015, LAP roll-out in 10/1/2016, with one year to test. We use Student Participation System data and feed it into the system.
- Why use SPS data? It’s readily available; part of HLC Quality Initiative; less politically charged; shown to correlate with student success; clear map of data schema; data is very robust, more data there than we are presently using; data is “complete” (better than Bb data; less complete than original LAP design).
- Each instructor will receive an Academic Alert Report.
My question: have you considered integration of co-curricular data into your models? YES! We’re very interested in integration of co-curricular data, because it’s often a better indicator for student success than LMS data. Vincent Tinto’s research clearly indicates this, but our implementation of this is probably a phase 3 or phase 4 thing.
This session will focus on innovations in using data insights in decision-making. What are the dos and don’ts that we’ve learned thus far. We’ll start with stories from each panelist, then go into Q&A. All material will be made available later (more to come on that).
- William Rainey Harper College: NW suburb of Chicage, a 2-year institution. 40,000 full time equivalent students
- “Project Discover” leader Matt McLaughlin. We got a title 3 grant to help do this project. Includes Inclusion, Engagement, Achievement, Onboarding, Intervening, etc.
- Data has been collected over 6 years.
- We originally used a proprietary data warehouse
- Grad rate increase in 10% in 5 years
- New reactive programs: early alert, supplemental instruction, completion concierge, summer bridge.
- These were REACTIVE programs, we wanted PROACTIVE solutions.
- University of KY
- What have we learned? We’ve integrated virtually everything we can, and are now moving into personalized learning and messaging.
- Respect complexity in learning analytics! I recommend reading “Arrival of the Fittest,” a book by Andreas Wagner. Their research on genomics highlights and models that can help our process. Instructional complexity is at least as complex as that of genomics. We don’t have just one paradigm of instructional theory, but dozens.
- Structure is important: get the right people on the bus, remove rivalries within your organization, give groups distinct and clear missions, align with organizational strategy.
- Engage the community: transparency makes a big difference; democratize analysis; enforce community etiquette, bring in students & faculty researchers; engage the broader higher education community.
- Use the right tools and techniques: speed enables fast thinking, fast group decision-making, fast everything; maximum semantic expressiveness and rich detail improves data quality, analytic flexibility; visualization is important.
- Conclusion: respect complexity, attend diligently to the very human aspects of this puzzle, ignite the passion of the community, choose and use your tools wisely
- I represent the Independent Colleges of IN
- A statute required student record information needed to be shared back with the state
- I needed to know how our institutions compared to others
- We worked with vendor partners (Dell & Statistica) to run descriptive and predictive analytics against the data we had
- We wanted to do card swipes, meal plans, and more for sub-group comparisons.
- The Statistica product has been made free for higher ed faculty and students
- I run the Statistica group at Dell
- We’ve done a lot of work in universities and hospitals
- We’re moving toward using data for real-time decision-making. A specific example was given about reduction in surgical infections…pretty powerful stuff.
“Maslow’s Hierarchy of Data Management”
- The spectrum: Data Management > Business Intelligence > Analytics
- The specific levels: Data Foundation > Basic Reporting > Performance Mgmt > Predictive > Prescriptive
Challenges and Observations
- Master organizational and technical planning, orchestrating organizational adoption.
- Bringing in the “executive management hammer” can be useful
- IR, advisor and counselor pushback, i.e. “you’re coming to take our jobs!” Dashboards and forms are actually a value-add for these folks that let them do their jobs more effectively.
- Usability testing and adoption feedback from students were interesting: “Why do you give us a number? Why don’t you just give us feedback and actions we can take?”
- ROL (“Return On Learning”), how can we quantify what you’re seeing? There is no control group! Profound payoff is that you’re able to make informed changes to policies that have real impact.
- Student subgroups with a GPA lower than X (not specified) were much more likely to stop out. This challenged many people’s beliefs, i.e. “how is this even possible?”
- University of Iowa cited an avoided cost of $31 million
- Data sharing with school districts for a full life-cycle on our students as they go through our system
- Classroom on realtime analytics, such as triggers set by faculty
- Get a handle on what our students do when they leave, i.e. wage data
- Improving the advising process
- Sharing findings with our institutions
One the things I try to do when I attend conferences is to make a detailed record of all the sessions I attend, with the exception of keynotes, which tend to get really good coverage from other folks. I live blog the events as I attend them, which hopefully helps those who committed to other sessions, and then I do one of these “mega posts,” which summarize all the posts I attended. Based on my itinerary, 2013 seems to be the year of big data and analytics. I’m willing to bet a lot of my fellow attendees will agree 🙂
I’ve been in higher education for just over seven years now, and somewhat amazingly, this was the very first EDUCAUSE event I’ve ever attended. Why didn’t anyone tell me about this conference? It was an extremely worthwhile event, at least for me…one of the meetings I had will likely save my division close to $50,000 each year! That savings will go a long way toward providing students at CSUN with more and/or better services. There were lots of great sessions to attend, with lots of smart folks sharing what they’re doing with IT on their campuses. I’ll definitely be back next year.
Without any further ado, here’s my EDUCAUSE 2013 mega-post…please drop me a line and let me know if this helps you!
Friday, October 18 (last day of EDUCAUSE was a half day)
Thursday, October 17 (my busiest day)
Wednesday, October 16 (spent a few hours prowling the vendor floor and visiting with my accessibility colleagues)
Tuesday, October 15 (each session was a half-day long)
Title: Turning Big Data Analytics into Personal Student Data
- Shah Ardalan, President, Lone Star College System
- Christina Robinson Grochett, Chief Strategist – Innovation & Research, Lone Star College System
SLIDE: The Challenge
- Why is our educational ranking getting worse as technology becomes faster and bigger
- Why is the US GDP still hanging around 2.0
- Why is the unemployment rate not reduced to an acceptable level?
- Whay are there 4 million unfilled jobs in the U.S?
SLIDE: The Buzz
- Cloud Computing
- Big Data
Assumption is that big data can solve our big problems
The DOE MyData Button
In October 2012, the DoE announced they will add a “MyData” download button to allow students to download their own data into a simple, machine-readable file that they could share at their own discretion, with 3rd parties that develop helpful consumer tools.
What it is: http://www.ed.gov/edblogs/technology/mydata/
The Technical Spec
- HEY, QUICK QUESTIONS: do students get hired off data? NO
- …analytics NO
- …reports NO
- documents: YES (transxripts, diplomas, resumes, etc.)
Education and Career Positioning System, MyEdu Vault
Self Assessment: values, interests, skills, personality type. Shows jobs available.
WOW, this is a lot like the Pathways tool my team built: https://pathways.studentaffairs.csun.edu/
Is this available for anyone? Yes. It’s available for $50 / year by the student, not the institution.