Category Archives: Uncategorized

The future is now: an update on the csu data lake


  • Brendan Aldrich, Chief Data Officer, CSU Office of the Chancellor

Gartner on Analytics & BI Strategy Key Findings

  • Use only a fraction of their data
  • Modern analytic tech do little to ensure deployment and use
  • Vis and interest have been transformed by AI, but QC has been under the radar for most orgs. Eventual impact may be equally significant

What is Dx (Digital Transformation)? Series of deep and coordinated culture, workforce, and tech shifts that enable new educational and operating models that transform an institution’s ops, strategic directions and value proposition.

Most folk are using the same tools, with small refinements over the last 30+ years.

Traditional Data Issues

  • Create a stable data history from source systems
  • We can answer questions that haven’t yet been asked
  • All our in-use data
  • It’s easy and fast to add new data
  • Focus on cleaning data in sources
  • Do the interesting easy stuff and curate the useful
  • Every team at every campus can iterate independently while maintaining order

Drop ALL data into the data lake > do transformations

CSU Data Lake: A Retrospective

  • June 2017: data lake prototype (data provided to CO collected and housed in SQL Server tables.
  • January 2018: CSU is 1st CA higher ed to appoint a Chief Data Officer

New BI/DW Sub-teams

  • Discover team: data lake architecture & functionality
  • Tomorrow Team: ETL & Modeling
  • FED Team: front end design
  • InfoSec: data privacy, protection and security

Data & Analytics Strategies Driving the Future: CSU Challenge: data is highly distributed across the system and not easily accessible/usable

Architectural Deep Dive

  • Shifting data from on-premise to cloud: Delphix. Data virtualization = secure, lightweight & portable data. Unique block mapping, block aware filtering, efficient compression, secure transfer.
  • Flashed through many complex slides with “enterprise-y looking” architectural diagrams, so I couldn’t effectively capture this info.
  • AWS – DMS: migrate DBs to AWS quickly & securely: homogenous & heterogeneous DB migrations, continuously replicate with HA, Streaming data to Amazon Redshift & S3, AWS schema conversion tool, fast and easy to set-up, supports widely used DBs.

Discovery Team: architectural issue: Oracle vs. Amazon DDL. Oracle does NOT go Redshift. We created a “teleporter” process that does conversion of the DDLs and stores it in RedShift with the data.

Cost optimization: $405/day, moved down $100/day

AWS is providing us with custom patches to improve DMS acceleration results.

We’re talking about creating reserved instances of DMS from Amazon to save costs.

Curated Student Collections

  • Students: student info
  • Students by Term: by terms they attended
  • by class: by enrolled classes
  • By degree: by degree(s) attained
  • by section: by class section offered
  • apps by applicant: by application submitted

Prototyping Tech

  • AWS: crawlers, data catalogs, glue
  • Airflow + Python: hand-crafted ETL platform
  • Alteryx, Matillion, Others: Visual ETL (new prototypes)

What do YOU Get When you Start Using All This?

Curated Data Sets

  • In the next 30 days: work with CIOs and heads of IR to ID participants
  • Data validation: no statewide normalization, does this look like what’s in your SIS?
  • The Goal: access to a set of curated data sets refreshed on a daily basis; once validated, we’ll give you the ETL code; we will assist and advise in implementing a campus environment, if desired

Data validation with Pentaho (shared a view of this tool)

Direct Data Lake Access

  • In the next 60 days: work with CIOs to ID initial participants
  • Looking for pilot campuses: 3-5 pilot campuses with rollout to all other campuses to follow
  • The Goal: direct access to stored copies of all source tables via data lake; campus teleporter: to help campuses spin up RedShift tables from files; we will assist and advise in connection and best practices

We’re about ready to get all you involved!

Data Governance Orchestration

  • Cross functional data governance teams: 17 of our 23 campuses
  • Over the next 6 months we will start coordinating with those teams to actively help to share data gov practices and data dictionary definitions across campuses
  • Introducing our new Student Analytics PM: Angela Williams



  • Lisa Smith, Infrastructure Engineer, CSUN
  • Steven Fitzgerald, Professor & Director of META+LAB, CSUN
  • Jorge Ruiz, Infrastructure Engineer, CSUN

META-LAB experience: provide real-world experience to CSUN students

New project = new problem

  • Create a new environment that is highly available and highly scalable for the lowest cost possible
  • When hosting a new app an IT team needs to determine the best solution for hosting the app
  • We considered: upfront cost, scalability, end-user satisfaction

Cloud Computing

Practice of using a network of remote servicers hosted on the internet to store, manage, and process data, rather than a local server

On-demand delivery of IT resources

Deployment Models

  • On-premise: physical in-house infrastructure
  • Al-in-cloud all virtualized cloud infra
  • Hybrid: on-premise and all-in-cloud

Pricing Models

  • On-demand: pay-as-you-go
  • Reserved: pay in advance and save the most (nothing up-front, partial up-front, all up-front)
  • Spot: savings up to 90%, dedicated (VERY CHEAP)

To Amazon, spot instances are unused “spare machines” nobody is using, and therefore wasting money from Amazon’s perspective.

AWS Terminology and Services

  • Region is a geographical area
  • Each region is made of 2 or more availability zones (achieves fault tolerance and stability, regions are isolated from one another)
  • You enable and control data replication across regions: when you distribute apps across multiple AZs, be aware of location-dependent privacy and compliance requirements.
  • EZ AZ is made up of 1 or more data centers
  • VPC: Networking component; same functionalities of an on-premise network (subnets, route tables, NAC lists, etc.)
  • EC2 and ELB: virtual servers and elastic load balancing (distribute incoming traffic, adjust to rapid changes in network traffic by distributing across multiple EC2 instances in the cloud w/out manual intervention.
  • Lambda: fully managed serverless compute, zero-admin compute platform, lets you run code w/out provisioning or managing servers, pay only for the compute time you consume.
  • Amazon Databases: RDS (MySQL, MSSQL Server, Postgresql, MariaDB, Oracle. High durability and highly available (multiple deployment types).

Availability on the Spot

  • How do we use cloud services to improve our infra and decrease costs for the client?
  • Make sure the infra was highly available and fault tolerant

Challenges of Spot Instances

  • Spot price fluctuates
  • Hourly prices is based on demand
  • 2 minute interruption warning(!)
  • Typically used if you can afford interruptions

Forming Solutions

For EC2 Spot Instances, Cloudwatch (triggering a Lambda function if the alarm is triggered), ELB to control traffic. An infrastructure diagram was shared.

Cloudwatch constantly watches our instances, when someone is willing to pay more for our spot instances, it tells us about the 2-minute warning so we can be proactive and do something about it. We call a Lambda function which spins up an additional spot instance for us, and it is configured so that it is registered with and works with our ELB. We use AMIs (pre-built images) that are configured for our application.

Final Thoughts

This solution allowed us to build an inexpensive and fault-tolerant infrastructure for

Q: how often have spot instances gone done for you? I don’t think we’ve changed our spot instance price for three months.

Q: Is there the possibility that both instances could go down? Yes, that can happen but it’s very rare. The 2-minute window is enough notice in our experience. We have an 18-month history of costs, and over 6 months it didn’t actually go up at all.

Q: can you do this for RDS as well? Yes.

Q: where are user sessions stored? Our application was an SPA (single page application); we never wrote to the local disk, it was always retrieved from the RDS instance. Load balancer will help to handle this.

Q: in Lambda, does your rule account for cost increases? YES! It queries current prices and selects the lowest cost one based on a bidding rule that we set, i.e. the least amount of money.

Q: you have used spot instances exclusively for this app, or is anything in “on-demand” instances? Only our RDS is on an on-demand instance. And we monitor that with a separate Lambda function. You’ve found that there’s always a spot instance available? Yes. In our experience, it’s worked well. Prices are pretty steady, but you can monitor that.

Q: how long have you been running the app in this scenario: Feb/Mar.

NASPA Needs a Technology Core Data Service, and Why This Matters to You

Who You Gonna Call?

Who do you call when you have a burning question about technology? Chances are good you have a picture of “that one techie” in your mind right now. You know their name, and you probably have their extension memorized. Beyond that, your knowledge of who does what with technology on your campus likely gets hazy. If you’re part of a system of universities, you may rely on “birds of a feather” colleagues at other campuses you meet with on a regular basis. No doubt you have colleagues who use the same software as you to administer departmental programming, can quote verse about the hoops you have to jump through to get the data you need, how your staff deals with social media, and so on. If you’re lucky, you get to go to conferences and have an informal network of professionals to lean on. Wouldn’t it be nice if there was an unbiased resource you could rely on to provide benchmarking information about technology-related topics germane to higher education? Something like this actually exists…sort of.

What’s a Core Data Service (CDS), Anyway?

The idea for a multi-organizational technology assessment in higher education is not new or original, nor did it materialize out of thin air. Since 2002, EDUCAUSE – the world’s largest community of IT leaders and professionals in higher education – has conducted an annual assessment of hundreds of campuses. The activities around this assessment culminate in a product they call the Core Data Service, or CDS. What’s in it? Benchmarking data on staffing, financials and a variety of technology services. It’s a fantastic reference for higher education technology professionals, especially leaders who need to know where they stand with respect to their peers. The problem with the EDUCAUSE CDS is that it does not collect data or provide insights that are particularly useful to student affairs professionals.

Why NASPA Needs Its Own Version of a CDS

Members of the Technology Knowledge Community (TKC) recognized the importance of technology to the profession many years ago. They believed it was such an important part of our work, they were able to successfully add it as a NASPA Professional Competency Area in 2010: Unlike EDUCAUSE, NASPA has no benchmarking tool focused on technology that we are aware of. We believe that a NASPA CDS would be a valuable resource for any NASPA member who needs to make decisions about the use of technology in their programs. A Core Data Service is a natural extension of the assessment culture that has been built in our profession; we think it should be a core product of the organization.

You might be asking yourself “why don’t we just ask EDUCAUSE to adapt their instrument so it can collect this data for us?” First, the overlap between NASPA members who participate in EDUCAUSE and vice-versa is rather small…the connection between organizations is probably not where it needs to be to make this happen (yet). Second, the vast majority of the technology we use in student services – particularly software-based – is not universally important to everyone in our organizations. Third, technology staffing models vary drastically from campus to campus. Hopefully, EDUCAUSE will continue to evolve and the data needs for student affairs will be more fully included. Until that time, however, adapting the concept for our needs at this time makes a lot of sense.

Enterprise Versus Niche Software

You may have heard the term “enterprise” invoked in hushed tones during campus meetings with IT and wondered what it meant. The way the word is used implies great importance. Generally speaking, “enterprise” refers to a product or service that everyone (or nearly everyone) in an organization depends on to do their job. When enterprise services go down, everyone panics. In the higher education software world, enterprise usually means the SIS (Student Information System), HR/Finance, portals, and email/calendaring tools. Enterprise software is expensive and complex, and requires a significant investment in professional IT resources. For many campuses, the responsibility for managing these systems lies with a Centralized IT department. As a general rule, enterprise software feeds, stores, and works on data that is considered to be the “source of truth” for an organization. They’re critical systems by definition.

Doesn’t every operational area in student affairs also depend on software? And isn’t that software just as important to what we do? In terms of complexity and usage, some of our systems rival enterprise software. Do you lead a Career Services department? There are software systems for you. How about Student Housing? You have multiple software options to choose from for managing residential life. Health Services? Check. Judicial Affairs/Student Conduct? Check. Clubs & Organizations? Disability Resources? Assessment? Check, check, check. Our software is important to us, but it isn’t universally important to everyone on campus. That’s what makes student services software niche software.

The bottom line here is that you probably want to know which software packages your peers use most often. It’s a reasonable question you’ve probably asked more than once.

Student Services Technology Support Varies Widely

Despite the fact that technology is enshrined as a NASPA professional competency, there’s little consistency around how we fund and staff it. Support models used by campuses to deliver student services technology vary widely (and wildly). Some campuses have a highly centralized IT division that coordinates services for every functional area on campus. Other campuses have multiple, decentralized technology units. Student affairs divisions may have a large or small technology department – or none at all – depending on the services needed. It’s fair to say that there are as many technology delivery models as there are members in the TKC!

We Have an Instrument That Just Might Work

In 2017, David Sweeney of the Texas A&M University system published the results of a system-wide student affairs software survey. This assessment provided TAMU’s Senior Student Affairs Officers with information about “…the distribution of ‘student affairs’ typical software packages and platforms…” and “…contract data with the aim of finding opportunities to share software across multiple units if indicated and desired.”* David’s survey spurred interest among several of us in the TKC in developing a similar but more expansive survey, with the intention of incorporating other pertinent details. After much discussion, we decided to measure the following:

  1. Institution (size, basic demographics)
  2. Student Affairs organization (services offered)
  3. Student Affairs IT (staffing level, type of support)
  4. Applications and Services

As a group, we felt that all four of these components would be useful for SSAOs (Senior Student Affairs Officers). We also felt that they would present a host of emergent benefits, such as improved collaboration between universities, leveraging our combined voices when communicating with vendors, providing hard data for NASPA’s assessment team, and so on. To that end, we developed a Qualtrics survey, currently hosted by the University of Pittsburgh. The survey is accessed by a link on the SAIT Pros web site at SAIT Pros is a free “non-denominational” association for people who do technology work in student affairs. You don’t have to be an IT geek to join, membership is free, and we host a Slack team where people can share what they know about products, services and processes, all without having to worry about vendors listening in. In our first year of running this assessment, we had 27 participating campuses, which indicates to us that our idea has merit. We asked for TKC sponsorship for a session to talk about this project at the national conference in Los Angeles, which the TKC granted. Thank you, TKC!

Our hope is that the TKC and the broader NASPA community also see value in a “NASPA Technology CDS.” Next steps include reaching out to the Assessment, Evaluation and Research Knowledge Community (AERKC) to identify potential improvements for version 2 of the survey and possible areas of collaboration with the TKC.

Paul Schantz is Director of Web & Technology Services for the Division of Student Affairs at California State University, Northridge. He currently serves as the EdTech representative to the TKC (NASPA), is the Chair of the Student Affairs IT Community Group (EDUCAUSE), and a co-founder of SA IT Pros.

A version of this post was originally published on the NASPA Technology Knowledge Community blog. This project was discussed during a technology session at the 2019 NASPA national conference in Los Angeles.


Next Gen vpsa


  • Josie Ahlquist, Research Associate and Instructor, Florida State University, @josieahlquist,
  • Dr. Ed Cabellon, Vice President for Student Services and Enrollment Management, Bristol Community College, @dredcabellon,
  • Mordecai Brownlee, Vice President of Student Success, St. Philip’s College, @ItsDrMordecai
  • Angela Batista, Vice President of Student Affairs and Institutional Diversity and Inclusion, Champlain College, @drangelabatista
  • Dr. Tim Miller, @JMUTimMiller


This is my first session of the 2019 NASPA conference, and I’m well-rested and ready to learn! When I saw the title “Next Gen VPSA,” I knew I needed to attend this session 🙂 Today’s agenda: facilitated discussion around ” purpose-driven digital leadership.” Any omissions or errors are mine.

Change: digital leaders accept and embrace change, calling on others to fill knowledge and skills gaps with technology.

Connection: digital engagement for campus leaders is built around relationships for genuine community building

Personalization: A holistic approach humanizes both a leaders’ campus position and their use of social media tools.

Strategy: campus leaders need to have a clear, yet flexible strategy that aligns their values and personality, as well as university objectives.

Legacy: the theory, practice, and pedagogy of leadership can be applied in digital context to create meaning, build community and leaves a legacy.

Question 1: How do you define “Next Gen VPSA?”

MB: more courses are moving online. SoMe is important for providing a level of representation of who you are and what your institution is about. It’s going to be a norm soon.

EC: I’m an early adopter and my research was around use of SoMe and tech by leaders in higher ed. When I became a VP nine months ago, I thought I’d be able to continue using SoMe the way I’d always used it…that came to a screeching halt! I’ve had to rethink how and why I use SoMe. It really helps when your president and board “get it.” I’m using MailChimp to help measure staff and student interest.

AB: being intentional and strategic is important. We need to be there for our staff and we need to keep learning. Our communication tools are most useful when we’re intentional about HOW we use them. Using it to share your true self is important because it appears in how you “show up.” I was able to respond to a student recently who had a less than ideal experience who said the campus did not care about students of color. Because I was on SoMe, I was able to respond directly to that student’s post.

MB: we’re able to respond in an immediate way…our students want to hear from us. These are opportunities for us to share that we see our students’ concerns, we hear our students’ concerns, and we care about them.

How do you balance your personal and professional accounts?

EC: I’m in a state role now. Because my FB account is intertwined with my personal life, I had to separate things. I do have an assistant that helps me out with things, but it’s still a lot of work to have multiple accounts.

JA: FB and Instagram allow you to have “branded pages” which are underneath the main institutional account.

AB: I intermingle my personal stuff with my professional stuff. I often will share articles, but that does not necessarily mean that I endorse them. If you’re going to do a branded page, make sure that it actually has value.

MB: make sure your SoMe has purpose! Really look at it! You need to evaluate what you’re looking at…ALL of it. You’re never “off” as a VPSA. SoMe is not a place to rant and rave.

EC: if you’re on Twitter, have a look at what lists you’re on. This is a good measurement of how people view you online.

MB: You need to have purpose behind your presence. You also need to be aware of what kind of interaction opportunities each platform presents. Some do not allow you to control things beyond the initial post. I am not an endorser.

JA: Instagram stories are the biggest ROI for younger people. Different intents for different platforms.

How much time do you spend on your SoMe?

TM: I have an assistant who I’ve given all my favorite books, and she provide motivational quotes M-Th, and I do things on Friday. I spend about an hour a day on mine.

AB: I spend most of my time on FB. I post at every event that I go to on campus, which helps with the student voice. Students who want me to amplify their voice, I ask them to tag me so that I can help them. It’s not about quantity, it’s about intent. It’s my way to build relationships.

MB: I spend less than 30 minutes a day on average. I check at the end of the day for sure.

How do you intentionally connect with staff and students?

AB: I don’t invite my staff to connect with me. If someone wants to connect, I really think about what that person wants from the relationship.

MB: if you’re a VP or senior student affairs officer, you should definitely have a conversation with your PR department. Be prepared to review your own personal material aligns with that of your institution.

How do you interact with your leadership team?

EC: Bring data to the table. Pick a platform that works best for your institution…even if it’s just one thing.

MB: I’m the only member of my cabinet that has a SoMe presence. You need to understand your campus culture…I push my president to be engaged with video and SoMe pictures.

AB: most of my colleagues are on SoMe, and they are growing their presence as a result of the posts that I’m making. In my opinion, it’s important to keep your opinions to yourself.

TM: I was the first on my cabinet to be on SoMe. Our PR team had an intervention with me. Students will pull you into very specific concerns…SoMe back-and-forth isn’t the place to resolve their concerns. However, I DO tell the students that I will meet with them individually to resolve their concerns.

Building Your Digital Transformation Ecosystem with LTI Advantage

This session moved pretty fast (and included some very dense slides which were impossible capture in text), so any omissions or mistakes in my notes are entirely my fault!


  • Rob Abel, CEO, IMS Global Learning Consortium
  • Michael Berman, Chief Innovation Officer and Deputy CIO, California State University, Office of the Chancellor
  • Vince Kellen, Chief Information Officer, University of California San Diego
  • Jennifer Sparrow, Senior Director of Teaching and Learning Technology, The Pennsylvania State University


What is LTI Advantage and IMS Global?

LTI Advantage (and Insights – for analytics) is a strategy as much as an interoperability standard. It’s an integration standard for LMS and tools that connect to an LMS.  It’s based on OAuth2 and JSON web objects, plus extensions for names & roles provisioning, assignment and grade services, deep linking and custom extensions.

There are 25 LTI Advantage early adopters, which include the usual suspects like D2L, Canvas, etc.

LTI Insights

Which LTI-enabled tools are being launched?

  • How frequently and when?
  • For which courses?
  • Are the tools actually being used? By how many unique users?
  • What are the usage trends?
  • What types of devices? Mobile?
  • Which LTI-enabled tools received PII, and what information is shared, exactly?

Why is this important?

LTI addresses 5 of the top 10 EDUCAUSE 2018 top 10 issues. Our orgs are often working with hundreds of suppliers, and integration is a BIG challenge.

JS: If a tool is IMS-compliant, it’s much easier for us to fast-track tools into our ecosystem.

MB: in our case, our system is a lot more decentralized so we’re trying to explain the value that LTI brings to our campuses.

VK: we want to make sure that our entire edtech ecosystem is LTI-compliant. It’s complicated and it’s not owned by any one entity. Standards of integration will help us to deliver a better teaching and learning environment.

JS: having the data streams come out in a way that does NOT require a lot of manipulation is a huge benefit for us and allows to be more precise with our predictive analytics and help us get our students to graduation.

RA: integration and analytics together – which LTI provides – allow us to do our jobs more effectively. Any supplier or institution can participate, which is probably unique to higher ed.

VK: data integration is a real rate limiter.

Question: what about extending LTI beyond the LMS, say, to the SIS? We’re working on that via the IMS EduAPI. EduAPI a set of industry standard extensible APIs to support user provisioning, common source ID and administrative data exchange.