After Tech in Motion's webinar Machine Learning, Data Science in 2020: Navigating a Challenging Climate, you had some excellent follow-up questions on the topic. The webinar is on demand, but we asked our moderator Elijah Young (Director of Innovation, ATMECS), and panel of experts, Elizabeth Owen, Ph.D. (Data Science Consultant, Google), Alexis Roos (Senior Manager, Data Science, Twitter), and Curtis Bennett (Solutions Architect, Vertica) to continue to share their insights on the challenges and opportunities facing data and machine learning in 2020. Scroll down for the answers to all your burning questions about machine learning-data science.
Machine Learning, Data Science: Table of Contents
- How do you see the current climate fundamentally changing businesses and the approach to data and Machine Learning?
- As a result of this global change, what type of opportunities do you think will come to the forefront around solving more coherent data-driven problems?
- If someone is looking to leverage their data by standing up solid Machine Learning/Data Science practices, what does that journey look like at a high level?
- How do we ensure algorithm use is driving towards real, actionable insights? & How can complex Machine Learning results be made more visible to key stakeholders?
- When/where should I build my own data practice/ml team, & when/where should I buy or use an external service-based model?
Data Science Environment
- Why is it important to create a culture of data?
- Where is Machine Learning data typically housed, performed and optimized?
- Who’s responsible for data security/PII with regard to new laws like GDPR/CCPA etc?
- How should companies deal with their data environments today and how do you see that progressing in the future?
Machine Learning: Looking Ahead
- How do you see Machine Learning evolving in the next few years, any potential use cases that are interesting?
- What industries are most poised for disruption and why?
- What are some tools, platforms, technologies you recommend or places to learn more about this? What are some good next steps someone can take to continue their journey into Machine Learning, Data Science?
Machine Learning, Data Science:
Current Climate | Challenges, Solutions & Opportunities
1. How do you see the current climate fundamentally changing businesses and the approach to Machine Learning and data science?
Now with Covid, businesses are updating their priorities and adapting to new realities. A company like Twitter, for instance, means that we have to handle a significant seasonal usage and we have to accelerate an initiative around data information integrity due to the spread of COVID-19 misinformation. We have to operate with reduced human review capability in order to accelerate some of the work we’re doing with machine learning. Every company is impacted differently and has to adapt to different challenges. With companies like AirBnB, for example, their entire business is altered and they need to come up with new services like an online experience. Alexis Roos
I do think it’s fundamentally changing. My perspective as a data and behavioral science expert in online systems and immersive digital environments, I think we’ll see a lot more distance usage. My work with Google is on Stadia, with entertainment, learning and business we’ll see a lot more use of instant technology to facilitate communication with others and acceleration of business goals and learning. This whole environment highlights how connected we are. If we walk in a workplace we take for granted the people sitting around us. Sitting in your own office at home accentuates the fact that we are all interconnected and need to be fostering ways to stay connected. Technology and data that supports distance integrations and communication is going to be key here. Personalization of these digital environments that allow us to connect and learn are key. I think we will see a new wave of products that will allow us to connect effectively and add personalization around the way we use these tools and data science has a lot to do with supporting that initiative in the future. Elizabeth Owen
Find open machine learning and data science roles here.
2. As a result of this global change, what type of opportunities do you think will come to the forefront around solving more coherent data-driven problems? Where should we look to find answers/make predictions, etc?
We need to be doubling down on connection and ways to connect. From a behavioral science perspective, people need to keep growing professionally to prepare themselves for future jobs that aren’t here yet as a result of this changing climate. One thing that analytics can help us to do is develop effective tools for communication and distance learning while leveraging our humanity, ideas of others, and connecting. In terms of education, learning, and professional development, leveraging the expertise of other people effectively and supporting the growth of individuals and the whole business. Doubling down on that and finding ways for adaptive personalized use of digital environments and connectivity software is going to be very helpful.How do we use data science to build better tools and make them effective and personalized-- not just for ourselves, but for the whole generation of children growing up that are going to need different tools? In a way, this is their 9/11 in the sense that this is the thing that happened in their childhood that they'll remember and say ‘wow that changed everything.’ The way we adjust to that as data experts is going to define their future and their learning. It’s not just for our current business time, it’s for the future, as well. Children are already being taught online and adapting to the new tools and software. Elizabeth Owen
Listen to our webcast on How to Mobilize a Remote Team for Tech Leaders.
3. If someone is looking to leverage their data by standing up solid Machine learning/data science practices, what does that journey look like at a high level?
Whoever has the most data is going to win and that’s never been more apparent. When you look at all the challenges in data science and analytics, nothing has changed but that there is more data to be had. Curtis BennettThis crisis has highlighted the need for transparency and quality of data. People’s lives depend on the availability of data from antibody tests and tests for COVID-19. The social distancing measures depend on transparency and availability of data regarding where the virus is spreading and where it’s not. The assimilation of different sources of data together into one highly visible space that everyone can use is a classic problem with data science but one we need to take very seriously because lives are at stake. Elizabeth Owen
4. How do we ensure algorithm use is driving towards real, actionable insights? & How can complex Machine Learning results be made more visible to key stakeholders? Ie: What am I getting for what I’m spending?There have been challenges with traditional approaches getting data into production and monetizing it. If you’re not benefiting from your insights, you’re not doing data science, you’re doing science experiments. Curtis Bennett
The key is measurement. You want to Measure your model when you de-block. make sure you have a way of measuring the way Machine Learning algorithm is working whether its implicit or explicit feedback or random measurements. Even a model that's good initially could degrade so before you put a model in production, you need a way of measuring and testing. Alexis Roos
5. When/where should I build my own machine learning/data practice team, & when/where should I buy or use an external service-based model?
You can look at it through the lenses of cost and risk, but if you look at machine learning and AI, the additional twist is that how sensitive and strategic your data is. Should we trust external vendors like Windows which may become competitors. Could you develop something more accurate by doing customization based on the uniqueness of your data and problem that would make sense of the business going forward? If not, then you should probably use what’s already out there. Alexis Roos
Data Environment | Importance & Best Practices
1. Why is it important to create a culture of data science? (data processing, data integrity, & usage/scale)
If we're talking about a culture of data or data science. I think we need to get a what do we mean by data science? We talk about data science and it's this big umbrella term. What does it really mean? And I think the answer to the previous question get at the very important point that it there's a whole way of systems thinking about data that is critical. So, how do we get at ROI we look at what is your research question or driving Insight? That should drive everything. And it in the vein of Agile development and scrum practices. What is your user story for that information? Who is going to be consuming this information and why? A classic case again with the Coronavirus is who needed this information and why? That particular user story wasn't focused on until the virus had spread quite a bit and that cost a lot.
As a parallel to how it would cost a business money, this has cost people lives which is much more serious and much more important. It just highlights the big picture. Being a data scientist doesn't mean you're running algorithms. You can run an algorithm but you need to understand what it does, if it has goodness of fit, and is it a rigorous result? Results are not insights. Those are two different things.
Is your model output good? Does it address your research question or your key insight? Does it provide value for the people in the audience that's necessary for it? Can you visualize that information? The systems thinking around that is really important because otherwise we get lost in the sea of data and we can run algorithm after algorithm with no real insight.
Data scientists need to be able to visualize results in a way that the end user can actually make use of them and so that whole ecosystem is part of scientific thinking. I actually don't view writing an algorithm or running a script as science. It's not science, it’s just tinkering with a line of code. To me, the scientific thinking is around the big picture and making sure you're connecting with people that matter for your key insights and rigor around applying and testing algorithms.
So that being said, I think that culture of data is important for data scientists that work in any environment and making insights and data tools accessible to people who are not at a data science level of expertise. When you're trying to produce data-driven products like an app, game, or learning environment, building in data during production is key. Data can't be an afterthought. Data can't be the thing that you think about after everything is shipped. Data has to be QA’ed and have data integrity. When you're in a production environment, you need to make things accessible to people who are not data scientists and as a data scientist have that larger systems thinking. Elizabeth Owen
Listen to our webinar on Managing the Cyber Risks of Remote Work
2. Where is Machine Learning data typically housed, performed and optimized?
Most of the data we're dealing with is public like most companies. The data sets are accessible widely within the company except some of the metadata which there are clearly a lot of rules and restriction around that. In most companies, you have many more silos and many struggle with data lag or being able to configure data. Twitter is interesting because a lot of data is already there. There is a lot of AI in our data because at the end of the day, it's really a data company. All we are producing is data and data services and there are a lot of use cases around keeping quality data on the platform and making sure you don't have people gaming the system. There's a spectrum of services related to that. Alexis Roos
3. Who’s responsible for data security/PII with regard to new laws like GDPR/CCPA etc?
This brings up a lot of the challenges in big data. We've all got more data than we can manage and GDPR and other policies have exposed a lot of the issues inherent in the system. For example, we've got clients who want us to back up and restore data to be able to delete an individual's record going back seven years. It's created a lot of interesting problems that for which no solution has existed prior. If you have a tape library sitting in a data center somewhere that's got seven years of tape backups on it. How do you go back and remove John Smith from that set of records seven years before? It’s a massive problem that people are still struggling with and it's created a lot of interesting opportunities and challenges for companies. Curtis Bennett
4. How should companies deal with their data environments today and how do you see that progressing in the future?
GDPR is in some ways like the Coronavirus - it's exposed a lot of the inherent flaws in the system. We've got situations where not only do I have to go back and do those kinds of things whereas, with the Coronavirus and all the economic ramifications of what's going on right now, it's brought to light things that weren't working very well before. It’s somewhat fascinating to see how companies are pivoting so quickly and trying so hard to turn on a dime. For example, I like to play Pokemon GO which is by nature a social game that involves going out in the public. Within days of the lockdown, Niantic had to make it an experience that was playable at the individual level. It's affecting everything from the games that we play to the data that we keep. Curtis Bennett
Listen to our webinar on The Future of FinTech
Part of my work has been in learning games and personalized response of game environments to the user. It's interesting because we're seeing a huge expansion of massive multiplayer online games and people signing on to that and we're seeing parallel floods of usership and data coming from similar environments. It’ll be interesting to see how we do there. And I think one big thing that's coming up which is a real trial by fire for people is Data Systems. Can you sustain this user load and do you have your data in a format that is easily consumable by your product team and by the sort of systematized analytics and algorithms you have in place so that it scales to this level of usage?
One thing when we talk about data integrity and format of data is that analysts need to have a say in the way that the data is structured and pulled out. There are several layers of engineering architecture, but the data needs to come out of the system ready to use in whatever form it needs to be consumed. That is a key concern and element to consider when you're setting up a data system. It doesn't matter how much data you collect, if you can't use it or clean it, then it doesn't matter. This is the question that needs to be addressed at a large scale. Your data needs to be usable, interpretable, and ready for consumption when it comes out of the system because that easily leads to easier integration of data sources. This is something that non-data nerds don't really like to hear about. When you go to your CEO and talk about the need for data integrity the eyes glaze over right? It’s not very sexy you talk about nobody really wants to think about it, but it is 80% of the work. Elizabeth Owen
Looking Ahead | Evolution, Disruption, & Future Predictions
1. How do you see Machine Learning evolving in the next few years?From a tech side, there has been tremendous progress in the last few years. When you look at a team’s actions for learning data augmentation, you can do ML with less data. This means that ML is becoming more approachable from a development standpoint. In addition to data, we have auto ML with capabilities to build a system almost automatically. Because we are using less data, machine learning has expanded the number of applications we can have making it applicable to pretty much any industry in anything you can think of. Alexis Roos
2. What industries are most poised for disruption and why?Every day is like a huge network test for phone companies and internet providers due to the massive amounts of usage that they've never seen before. Companies like Zoom and GoToWebinar are struggling with the demand and usage of their systems, whereas companies like AirBnB are trying to figure out a way to monetize their service. We’re also seeing forced innovation in the sense that industries which never considered employees working from home previously are finding that it works okay for them now. Curtis Bennett
Companies utilizing a chat board service and that are being adaptive with AI will have more capability and a competitive advantage right now. Alexis Roos
Almost every company that deals with distance learning or telecommunication or ways of connecting people across different contexts are going to feel this data burden and they're going to be forced to optimize their systems for this kind of scale. From a larger perspective: What data do you need to collect? Who is it for? What are your algorithms? What are your goodness metrics? And how do you visualize that and utilize that even if it's for product development? How do we develop the personalization of the tool itself to the user in products such as educational systems, games, professional development, teleconferencing software, and health care? We're seeing a shift to virtual appointments now that people aren’t going into the doctor’s office.
When you set up these kinds of systems to be able to use your data again, you need to have a data infrastructure and a data labeling system that allows your data to be interpretable and usable upfront. When we scale, we can be efficient if data is labeled clearly and consistently across the system. First, different data sources can be integrated together in a more easily consumable way. To scale that further with feature engineering, based on the data coming in, if you have clearly labeled as consistent data, then it is scaled for feature engineering for analysis.
When we talk about these challenges, we have to talk about scale. I mean that's what's happening. We are in a global economy. We are in a fully connected world at this point. Look at the analogy of the spread of the virus - it's worldwide. Everybody is connected and we have to deal with that and then confront it. I think it will be very positive in the end in terms of the impact on optimizing systems and thinking through data from a systems perspective. Elizabeth Owen
3. What are some tools, platforms, technologies you recommend or places to learn more about this? What are some good next steps someone can take to continue their journey into Machine Learning, Data Science?
There are a lot of students getting into the Data Science discipline. The tools will become more democratized and easier to use, but there are some great YouTube videos and articles to learn from. The thing that I found to be challenging is not that it’s difficult to learn what a linear regression is or how it works, but understanding the challenges around the data, how it’s integrated within the project, and how you need to present the data for that purpose. I would suggest if you're trying to learn Data science to find a mentor who has been through those problems and can help you with some of the interesting challenges due to gray areas between the tools which make data science tricky. Curtis Bennett
There’s never been a better time to learn because you have cloud-based platforms like Coursera that allow you to code data with notebooks and open source frameworks like Apache Spark which are very easy to learn for free. The scope of AI is large so it's good to know what you want to specialize in whether it’s data analytics, data engineering, machine learning, deep learning, or even research. Whatever you choose, it's good to get an appreciation of data first and start specializing from there. Alexis Roos
There are a lot of platforms for personalized learning such as Code Academy or Data Camp where you can write code that is evaluated in real-time. This is a nice tool for a personalized application of data that allows adaptivity to the learner. This is helpful for building your personal code base because these are accessible to all different levels of learners. Elizabeth Owen
Join an online community where you can be embedded and jump in at whatever level. I would follow leaders in data science like our panelists. I would follow leaders on LinkedIn and ask questions or read their articles to try to get a baseline understanding of your personal interest and how you would base that on the education and learning that you want to proceed for your career growth in ML. Elijah Young
Check out our Tech Market Insights page to learn more about current tech trends, access candidate resources, or participate in a future webinar.